[
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090285#comment-15090285
]
Bob Paulin edited comment on TIKA-1824 at 1/9/16 1:43 AM:
----------------------------------------------------------
* Perhaps rename artifact names in parser sub-components to include
"Parser(s?)", e.g. Apache Tika Parser Advanced Module so that the names sort
more clearly (at least in the maven window in Intellij)?
I think I felt it was redundant but in a maven repo it could be helpful so I
can make that change.
* Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module
Same as above.
* Perhaps lowercase names in parser-subcomponents so that they're inline with
legacy: "Apache Tika parser advanced module"
I think I'm missing where this convention is coming from.
* Pkcs7Parser ... should that be under advanced...or somewhere else ...own
crypto package?
So I don't feel strongly that it needs to be under advanced but I do want to be
careful not to over do the number of modules. Do you feel crypto has room for
growth or is this just going to forever be a one parser project?
* iwork ...should we move that to office?
Actually I had it that way initially. Issue is the iwork parser is used inside
of the ZipContainerDetector which makes the dependency graph awkward. We would
need to find a why to break that dependency to make this work.
* tika-test-resources...should we move TikaTest into that and change the name
to tika-test? I have a vague memory of wanting to carve out a separate test
package earlier and adding TikaTest and something else...
I think it could work in tika-core or tika-test. I don't think I feel strongly
either way.
* OutlookPSTParser...move that to office?
I'd like to keep this class with all the other mbox classes. Maybe me mbox to
office?
* Does MBox belong in web? Not sure where to put it?
Move to office?
* Move CommonsDigester to core if we're willing to add a dependency on
commons-codec into core?
I'm fine with this.
* Move Activator to tika-bundle?
I believe tika-bundle already has an activator. Could just remove this.
* Move pot to multimedia or add tika-parsers-multimedia-advanced-module?
Not sure I understand POT in multimedia. Can you elaborate?
* Move geo.topic to "advanced"...perhaps we rename "advanced" to ner?
Is ner only applied to geo? My understanding of this domain is limited
* Move ctakes to "advanced/ner"?
Again my understanding of the domain is limited on what ctakes fits with.
* Collapse web and text?
Not sure I like that since a number of modules depend on text but not web.
Seems like we'd be adding a lot of needless dependencies.
was (Author: bobpaulin):
* Perhaps rename artifact names in parser sub-components to include
"Parser(s?)", e.g. Apache Tika Parser Advanced Module so that the names sort
more clearly (at least in the maven window in Intellij)?
I think I felt it was redundant but in a maven repo it could be helpful so I
can make that change.
* Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module
Same as above.
* Perhaps lowercase names in parser-subcomponents so that they're inline with
legacy: "Apache Tika parser advanced module"
I think I'm missing where this convention is coming from.
* Pkcs7Parser ... should that be under advanced...or somewhere else ...own
crypto package?
So I don't feel strongly that it needs to be under advanced but I do want to be
careful not to over do the number of modules. Do you feel crypto has room for
growth or is this just going to forever be a one parser project?
* iwork ...should we move that to office?
I think it could fit there too. No issues moving.
* tika-test-resources...should we move TikaTest into that and change the name
to tika-test? I have a vague memory of wanting to carve out a separate test
package earlier and adding TikaTest and something else...
I think it could work in tika-core or tika-test. I don't think I feel strongly
either way.
* OutlookPSTParser...move that to office?
I'd like to keep this class with all the other mbox classes. Maybe me mbox to
office?
* Does MBox belong in web? Not sure where to put it?
Move to office?
* Move CommonsDigester to core if we're willing to add a dependency on
commons-codec into core?
I'm fine with this.
* Move Activator to tika-bundle?
I believe tika-bundle already has an activator. Could just remove this.
* Move pot to multimedia or add tika-parsers-multimedia-advanced-module?
Not sure I understand POT in multimedia. Can you elaborate?
* Move geo.topic to "advanced"...perhaps we rename "advanced" to ner?
Is ner only applied to geo? My understanding of this domain is limited
* Move ctakes to "advanced/ner"?
Again my understanding of the domain is limited on what ctakes fits with.
* Collapse web and text?
Not sure I like that since a number of modules depend on text but not web.
Seems like we'd be adding a lot of needless dependencies.
> Tika 2.0 - Create Initial Parser Modules
> -----------------------------------------
>
> Key: TIKA-1824
> URL: https://issues.apache.org/jira/browse/TIKA-1824
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 2.0
> Reporter: Bob Paulin
> Assignee: Bob Paulin
>
> Create initial break down of parser modules.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)