[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283830#comment-16283830 ]
Hudson commented on TIKA-2519: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1408 (See [https://builds.apache.org/job/Tika-trunk/1408/]) Fix thread-safety in ChmExtractor (TIKA-2519). (tallison: [https://github.com/apache/tika/commit/2169cae44277a18430e4de462b4ae5b1dfb8956b]) * (edit) CHANGES.txt * (edit) tika-parsers/src/main/java/org/apache/tika/parser/chm/lzx/ChmBlockInfo.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java * (edit) tika-parsers/src/test/java/org/apache/tika/TestParsers.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java * (add) tika-core/src/test/java/org/apache/tika/MultiThreadedTikaTest.java TIKA-2519 clean up, fix bug in MultiThreadedTikaTest files that failed (tallison: [https://github.com/apache/tika/commit/95baca2b58538ec1d75fc5b6c80fd06b7eebb7dc]) * (edit) tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java * (edit) tika-core/src/test/java/org/apache/tika/MultiThreadedTikaTest.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/chm/lzx/ChmBlockInfo.java > Issue parsing multiple CHM files concurrently > --------------------------------------------- > > Key: TIKA-2519 > URL: https://issues.apache.org/jira/browse/TIKA-2519 > Project: Tika > Issue Type: Bug > Affects Versions: 1.16 > Reporter: Eamonn Saunders > Priority: Blocker > Fix For: 1.17 > > > Should I expect to be able to parse multiple CHM files concurrently in > multiple threads? > What I'm noticing when attempting to parse 2 different CHM files in different > threads is that: > - ChmExtractor.extractChmEntry() gets a ChmBlockInfo as follows: > {code} > ChmBlockInfo bb = ChmBlockInfo.getChmBlockInfoInstance( > directoryListingEntry, (int) getChmLzxcResetTable() > .getBlockLen(), getChmLzxcControlData()); > {code} > - ChmBlockInfo.getChmBlockInfoInstance() is a static method that appears to > limit the number of ChmBlockInfo instances to 1. > {code} > public static ChmBlockInfo getChmBlockInfoInstance( > DirectoryListingEntry dle, int bytesPerBlock, > ChmLzxcControlData clcd) { > setChmBlockInfo(new ChmBlockInfo()); > getChmBlockInfo().setStartBlock(dle.getOffset() / bytesPerBlock); > getChmBlockInfo().setEndBlock( > (dle.getOffset() + dle.getLength()) / bytesPerBlock); > getChmBlockInfo().setStartOffset(dle.getOffset() % bytesPerBlock); > getChmBlockInfo().setEndOffset( > (dle.getOffset() + dle.getLength()) % bytesPerBlock); > // potential problem with casting long to int > getChmBlockInfo().setIniBlock( > getChmBlockInfo().startBlock - getChmBlockInfo().startBlock > % (int) clcd.getResetInterval()); > // (getChmBlockInfo().startBlock - > getChmBlockInfo().startBlock) > // % (int) clcd.getResetInterval()); > return getChmBlockInfo(); > } > {code} > Is there a good reason why there should only ever be one instance of > ChmBlockInfo? > Should we forget about attempting to process CHM files in parallel and > instead queue them up to be processed sequentially? -- This message was sent by Atlassian JIRA (v6.4.14#64029)