[
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242310#comment-13242310
]
Bernd Fehling edited comment on SOLR-3011 at 3/30/12 1:05 PM:
--------------------------------------------------------------
Just tried multi-threaded. It produces the required number of threads (seen in
debugger) but only runs once.
My configuration is:
{code:xml}
<dataConfig>
<dataSource name="filetraverser" type="FileDataSource" encoding="UTF-8"
/>
<document>
<entity name="basedata" processor="FileListEntityProcessor"
threads="4"
rootEntity="false"
fileName="\.xml$"
recursive="true"
dataSource="null"
baseDir="/srv/www/solr/DATA/OAI"
>
<entity name="records" processor="XPathEntityProcessor"
threads="4"
rootEntity="true"
dataSource="filetraverser"
stream="true"
forEach="/documents/document"
url="${basedata.fileAbsolutePath}"
>
<field column="id" xpath="/documents/document/@id"
/>
<field column="dctitle"
xpath="/documents/document/element[@name='dctitle']/value" />
</entity>
</entity>
</document>
</dataConfig>
{code}
It should read all files below baseDir and build documents from the records
inside the files.
Works fine in non-multi-threaded but only reads the first file in
multi-threaded mode.
Any idea?
And another thing to mention, in TestThreaded.java there are the lines:
{code:xml}
@Test
public void testCachedThreadless_FullImport() throws Exception {
runFullImport(getCachedConfig(random.nextBoolean(), random.nextBoolean(),
0));
}
@Test
public void testCachedSingleThread_FullImport() throws Exception {
runFullImport(getCachedConfig(random.nextBoolean(), random.nextBoolean(),
1));
}
@Test
public void testCachedThread_FullImport() throws Exception {
int numThreads = random.nextInt(9) + 1; // between one and 10
String config = getCachedConfig(random.nextBoolean(), random.nextBoolean(),
numThreads);
runFullImport(config);
}
{code}
This will test 0, 1 and random between 1 to 9. But 1 is already covered.
So wouldn't it be better to have "random.nextInt(8) + 2" for the range 2 to 9?
was (Author: befehl):
Just tried multi-threaded. It produces the required number of threads (seen
in debugger) but only runs once.
My configuration is:
<dataConfig>
<dataSource name="filetraverser" type="FileDataSource" encoding="UTF-8"
/>
<document>
<entity name="basedata" processor="FileListEntityProcessor"
threads="4"
rootEntity="false"
fileName="\.xml$"
recursive="true"
dataSource="null"
baseDir="/srv/www/solr/DATA/OAI"
>
<entity name="records" processor="XPathEntityProcessor"
threads="4"
rootEntity="true"
dataSource="filetraverser"
stream="true"
forEach="/documents/document"
url="${basedata.fileAbsolutePath}"
>
<field column="id" xpath="/documents/document/@id"
/>
<field column="dctitle"
xpath="/documents/document/element[@name='dctitle']/value" />
</entity>
</entity>
</document>
</dataConfig>
It should read all files below baseDir and build documents from the records
inside the files.
Works fine in non-multi-threaded but only reads the first file in
multi-threaded mode.
Any idea?
And another thing to mention, in TestThreaded.java there are the lines:
@Test
public void testCachedThreadless_FullImport() throws Exception {
runFullImport(getCachedConfig(random.nextBoolean(), random.nextBoolean(),
0));
}
@Test
public void testCachedSingleThread_FullImport() throws Exception {
runFullImport(getCachedConfig(random.nextBoolean(), random.nextBoolean(),
1));
}
@Test
public void testCachedThread_FullImport() throws Exception {
int numThreads = random.nextInt(9) + 1; // between one and 10
String config = getCachedConfig(random.nextBoolean(), random.nextBoolean(),
numThreads);
runFullImport(config);
}
This will test 0, 1 and random between 1 to 9. But 1 is already covered.
So wouldn't it be better to have "random.nextInt(8) + 2" for the range 2 to 9?
> DIH MultiThreaded bug
> ---------------------
>
> Key: SOLR-3011
> URL: https://issues.apache.org/jira/browse/SOLR-3011
> Project: Solr
> Issue Type: Sub-task
> Components: contrib - DataImportHandler
> Affects Versions: 3.5
> Reporter: Mikhail Khludnev
> Assignee: James Dyer
> Priority: Minor
> Fix For: 3.6
>
> Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch,
> SOLR-3011.patch, SOLR-3011.patch,
> patch-3011-EntityProcessorBase-iterator.patch,
> patch-3011-EntityProcessorBase-iterator.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and
> SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly
> it's a SOLR-2947 patch from 28th Dec.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]