[
https://issues.apache.org/jira/browse/NUTCH-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591553#comment-13591553
]
Lewis John McGibbney commented on NUTCH-1537:
---------------------------------------------
Yeah there is a fair bit of duplication which was actually the initial driver
for me to improve this aspect of Nitch. Over time we can work to reduce and
certainly improve the code.
Regarding the above
1. I agree here. We don't need, and would be wasting our time, to implement
everything from Apache Tika. If/when one of us moves on, it becomes a pain for
new and existing developers to manage the code.
2. Well it is not as if we are moving away from Apache Tika any time soon.
There was a huge effort to move the Tika stuff out of Nutch, which meant that
we have a direct dependency upon the project. Though some can see this
dependency as a limitation on the Nutch side, Tika are making relases and the
community seems to be in a healthy state so I don't personally consider this as
a limitation. If things change in Tika, then we change them in Nutch if and
when we can. Until then we m ake best use of the code. I would not disagree
with your suggestion on this one.
3. I don't see the additional tika-core libraries as an issue here. If we use
the code in a more (dependency rich) inclusive nature then I think overall it
is better for Nutch.
Thanks for providing the explicit options as above Sebastian. I think for the
time being we should try to get consensus on which one(s) to progress with.
> Legacy metadata package needs to take advantage of Apache Tika metadata
> package more.
> -------------------------------------------------------------------------------------
>
> Key: NUTCH-1537
> URL: https://issues.apache.org/jira/browse/NUTCH-1537
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.6, 2.1
> Reporter: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.7, 2.2
>
>
> In Nutch, classes from the metadata package are being used in quite a number
> of places. It is not currently being used to reflect the work going on in
> Apache Tika and we need to better leverage the vocabularies available to us
> from the dependency on Apache Tika.
> The introduction of TikaCoreProperties in Tika 1.2 is not currently leveraged
> in Nutch. This is just one example of an improved way for us to add metadata
> to Nutch documents.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira