[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636102#comment-15636102
 ] 

Nick Burch commented on TIKA-2146:
----------------------------------

My guess is it's about 2-3 weeks of work at the POI level to add support for 
this. Unless you've got a handy intern or some budget, it looks unlikely it'll 
be fixed soon...

However, it's probably only 2-3 hours of work reading through the published 
.DOC file format specs from Microsoft to find out how encrypted word documents 
are marked as such in the file. You probably want 
https://msdn.microsoft.com/en-us/library/office/gg615596(v=office.14).aspx then 
https://msdn.microsoft.com/en-us/library/office/cc313153(v=office.12).aspx . 
Once someone has found that out, it's only a few minutes work to add the check 
and throw a more helpful exception

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2146
>                 URL: https://issues.apache.org/jira/browse/TIKA-2146
>             Project: Tika
>          Issue Type: Bug
>          Components: core, parser
>    Affects Versions: 1.11
>         Environment: Windows 7
>            Reporter: Sharath Kumar
>         Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:537)
>       at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>       at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>       at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>       at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>       at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>       at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>       at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
>       at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:345)
>       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to