[
https://issues.apache.org/jira/browse/TIKA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sachin Shaju updated TIKA-1966:
-------------------------------
Description:
I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed
content as it is instead getting some other output from the content handler.
Code snippet that I've used is attached with this.
Output :-
Contents of the file :
Index/Document.iwa
Index/ViewState.iwa
Index/CalculationEngine.iwa
Index/Tables/HeaderStorageBucket-2.iwa
Index/Tables/Tile.iwa
Index/Metadata.iwa
Metadata/Properties.plist
I'm able to detect the file type using Detector api correctly. But am not
getting the useful content out of the document.
I'm attaching the iWorks docs that I've tested with (made with latest version
of iOS). I got it working when testing with older versions. Thanks
was:
I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed
content as it is instead getting some other output from the content handler.
Code snippet that I've used and the output I got is added below.
private void parseFile(File file) {
try{
File file = new File("/home/user/tika/samples/budget.numbers");
FileInputStream inputStream = new FileInputStream(file);
ParseContext context = new ParseContext();
BodyContentHandler bodyHandler = new BodyContentHandler(-1);
Parser parser=new AutoDetectParser();
parser.parse(inputStream, bodyHandler, new Metadata(), context);
System.out.println("Contents of the file :"+bodyHandler.toString());
}
catch(IOException | SAXException | TikaException e){
e.printStackTrace();
}
}
Output :-
Contents of the file :
Index/Document.iwa
Index/ViewState.iwa
Index/CalculationEngine.iwa
Index/Tables/HeaderStorageBucket-2.iwa
Index/Tables/Tile.iwa
Index/Metadata.iwa
Metadata/Properties.plist
I'm able to detect the file type using Detector api correctly. But am not
getting the useful content out of the document.
I'm attaching the iWorks docs that I've tested with (made with latest version
of iOS). I got it working when testing with older versions. Thanks
> Issue in parsing iWorksDocument with Apache Tika
> ------------------------------------------------
>
> Key: TIKA-1966
> URL: https://issues.apache.org/jira/browse/TIKA-1966
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.12
> Environment: Ubuntu 15
> Reporter: Sachin Shaju
> Attachments: budget.numbers, connors_20040127.key, pages.pages
>
>
> I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed
> content as it is instead getting some other output from the content handler.
> Code snippet that I've used is attached with this.
> Output :-
> Contents of the file :
> Index/Document.iwa
> Index/ViewState.iwa
> Index/CalculationEngine.iwa
> Index/Tables/HeaderStorageBucket-2.iwa
> Index/Tables/Tile.iwa
> Index/Metadata.iwa
> Metadata/Properties.plist
> I'm able to detect the file type using Detector api correctly. But am not
> getting the useful content out of the document.
> I'm attaching the iWorks docs that I've tested with (made with latest version
> of iOS). I got it working when testing with older versions. Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)