All,

I took a stab at the initial module structure based on Tim and my email [1]. If a package didn't seem to fit with anything else I created an individual project for it. If any of the groupings don't make sense or folks think there are better ways to organize I'm happy to move stuff around. Patches are welcome :). I have a JIRA created [2]. Commited with rev 1723223.

There's still a good amount of outstanding work:
1) All this could use more testing.  Especially with the external parsers.
2) As Tim has already raised there is the issue of dual maintaining branches. There are likely some fixes in trunk that are not currently applied to the 2.0 branch. 3) The tika-parser project is currently using the maven shade plugin and that is causing issues creating the OSGi Manifest.MF file. I should be able to find a way around this. 4) Still need to recreate the OSGi uber jar with all dependencies packaged with the tika code. 5) There are still some classes in the tika-parser project. Should these all be moved to core? A common project?...
6) Documentation.  I could use some Wiki access.  Username: BobPaulin.
7) There are some dependencies in the tika-parser project that were not needed to compile any of the individual modules or run tests. Are they still needed? 8) Where does the org.apache.tika.parser.external.CompositeExternalParser ServiceLoader (META-INF/services/org.apache.tika.parser.Parser) config belong. I moved it to tika-core since that is where the class lives. 9) Subcomponent licenses. I moved them to the modules they belong in but I need to figure out a way to make them bubble up to the uber jars. Or perhaps they need to be dual maintained.
10) Anything I may be forgetting....;)

For the most part all the changes just to organize the existing packages. There are a handful of changes to the test suite in order to break some cyclical dependencies. Here's an overview of how the projects interrelate at the moment:

tika-parser-modules
 - /tika-advanced-module
 - /tika-cad-module
           -> tika-text-module [test]
 - /tika-code-module
           -> tika-text-module [test]
 - /tika-database-module
           -> tika-office-module [test]
 - /tika-ebook-module
           -> tika-text-module
 - /tika-journal-module
           -> tika-pdf-module
 - /tika-multimedia-module
           -> tika-web-module [test]
           -> tika-office-module [test]
           -> tika-pdf-module [test]
 - /tika-office-module
           -> tika-web-module [test]
           -> tika-package-module [test]
           -> tika-text-module [test]
 - /tika-package-module
 - /tika-pdf-module
          -> tika-text-module [test]
          -> tika-package-module [test]
          -> tika-office-module [test]
 - /tika-scientific-module
          -> tika-text-module [test]
 - /tika-text-module
 -/tika-web-module
          -> tika-text-module [test]
          -> tika-package-module [test]

Very interested in feedback since we have been talking about this for a bit but I'm sure actually seeing it will create more discussion. Looking at how much simpler the individual pom files does seem to demonstrate that this will be a good thing for the project.

Cheers,

- Bob

[1] http://mail-archives.apache.org/mod_mbox/tika-dev/201508.mbox/%3C55CF4C19.6050503%40bobpaulin.com%3E
[2] https://issues.apache.org/jira/browse/TIKA-1824

Reply via email to