[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981144#comment-16981144
]
Nicholas DiPiazza commented on TIKA-2224:
-----------------------------------------
Dear watchers of this issue:
I am working on a OneNote tika parser. And I'm at the point where I need some
help with some of the workings of OneNote documents.
Here is the project so far:
https://github.com/nddipiazza/onenote-parser-java
Basically I just need some help understanding some of the finer details of the
OneNote format and how to extract info from it.
https://stackoverflow.com/questions/59008205/onenote-parsing-how-to-get-to-the-text-blobs-in-the-document
https://stackoverflow.com/questions/59020176/onenote-not-able-to-find-all-the-property-ids-in-the-microsoft-documentation
If anyone has a moment, can you please drop in and peak at the source and also
see if you can answer my questions?
> OneNote formats support - Mime Magic and Parser
> -----------------------------------------------
>
> Key: TIKA-2224
> URL: https://issues.apache.org/jira/browse/TIKA-2224
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.14
> Reporter: Nick Burch
> Priority: Major
> Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
>
>
> As raised at
> http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers,
> we don't have any magic for the OneNote formats. Several years ago we dug
> out the file format specs (see
> http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but
> didn't have volunteer energy to implement a parser. However, armed with those
> specs, we should be able to come up with some mime magic for detection
--
This message was sent by Atlassian Jira
(v8.3.4#803005)