[ 
https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283830#comment-16283830
 ] 

Hudson commented on TIKA-2519:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1408 (See 
[https://builds.apache.org/job/Tika-trunk/1408/])
Fix thread-safety in ChmExtractor (TIKA-2519). (tallison: 
[https://github.com/apache/tika/commit/2169cae44277a18430e4de462b4ae5b1dfb8956b])
* (edit) CHANGES.txt
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/chm/lzx/ChmBlockInfo.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java
* (edit) tika-parsers/src/test/java/org/apache/tika/TestParsers.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java
* (add) tika-core/src/test/java/org/apache/tika/MultiThreadedTikaTest.java
TIKA-2519 clean up, fix bug in MultiThreadedTikaTest files that failed 
(tallison: 
[https://github.com/apache/tika/commit/95baca2b58538ec1d75fc5b6c80fd06b7eebb7dc])
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java
* (edit) tika-core/src/test/java/org/apache/tika/MultiThreadedTikaTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/chm/lzx/ChmBlockInfo.java


> Issue parsing multiple CHM files concurrently
> ---------------------------------------------
>
>                 Key: TIKA-2519
>                 URL: https://issues.apache.org/jira/browse/TIKA-2519
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Eamonn Saunders
>            Priority: Blocker
>             Fix For: 1.17
>
>
> Should I expect to be able to parse multiple CHM files concurrently in 
> multiple threads?
> What I'm noticing when attempting to parse 2 different CHM files in different 
> threads is that:
> - ChmExtractor.extractChmEntry() gets a ChmBlockInfo as follows:
> {code}
>                 ChmBlockInfo bb = ChmBlockInfo.getChmBlockInfoInstance(
>                         directoryListingEntry, (int) getChmLzxcResetTable()
>                                 .getBlockLen(), getChmLzxcControlData());
> {code}
> - ChmBlockInfo.getChmBlockInfoInstance() is a static method that appears to 
> limit the number of ChmBlockInfo instances to 1.
> {code}
>     public static ChmBlockInfo getChmBlockInfoInstance(
>             DirectoryListingEntry dle, int bytesPerBlock,
>             ChmLzxcControlData clcd) {
>         setChmBlockInfo(new ChmBlockInfo());
>         getChmBlockInfo().setStartBlock(dle.getOffset() / bytesPerBlock);
>         getChmBlockInfo().setEndBlock(
>                 (dle.getOffset() + dle.getLength()) / bytesPerBlock);
>         getChmBlockInfo().setStartOffset(dle.getOffset() % bytesPerBlock);
>         getChmBlockInfo().setEndOffset(
>                 (dle.getOffset() + dle.getLength()) % bytesPerBlock);
>         // potential problem with casting long to int
>         getChmBlockInfo().setIniBlock(
>                 getChmBlockInfo().startBlock - getChmBlockInfo().startBlock
>                         % (int) clcd.getResetInterval());
> //                (getChmBlockInfo().startBlock - 
> getChmBlockInfo().startBlock)
> //                        % (int) clcd.getResetInterval());
>         return getChmBlockInfo();
>     }
> {code}
> Is there a good reason why there should only ever be one instance of 
> ChmBlockInfo?
> Should we forget about attempting to process CHM files in parallel and 
> instead queue them up to be processed sequentially?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to