[ 
https://issues.apache.org/jira/browse/NUTCH-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591553#comment-13591553
 ] 

Lewis John McGibbney commented on NUTCH-1537:
---------------------------------------------

Yeah there is a fair bit of duplication which was actually the initial driver 
for me to improve this aspect of Nitch. Over time we can work to reduce and 
certainly improve the code.
Regarding the above
1. I agree here. We don't need, and would be wasting our time, to implement 
everything from Apache Tika. If/when one of us moves on, it becomes a pain for 
new and existing developers to manage the code.
2. Well it is not as if we are moving away from Apache Tika any time soon. 
There was a huge effort to move the Tika stuff out of Nutch, which meant that 
we have a direct dependency upon the project. Though some can see this 
dependency as a limitation on the Nutch side, Tika are making relases and the 
community seems to be in a healthy state so I don't personally consider this as 
a limitation. If things change in Tika, then we change them in Nutch if and 
when we can. Until then we m ake best use of the code. I would not disagree 
with your suggestion on this one.
3. I don't see the additional tika-core libraries as an issue here. If we use 
the code in a more (dependency rich) inclusive nature then I think overall it 
is better for Nutch.

Thanks for providing the explicit options as above Sebastian. I think for the 
time being we should try to get consensus on which one(s) to progress with.
                
> Legacy metadata package needs to take advantage of Apache Tika metadata 
> package more.
> -------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1537
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1537
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.6, 2.1
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.7, 2.2
>
>
> In Nutch, classes from the metadata package are being used in quite a number 
> of places. It is not currently being used to reflect the work going on in 
> Apache Tika and we need to better leverage the vocabularies available to us 
> from the dependency on Apache Tika.
> The introduction of TikaCoreProperties in Tika 1.2 is not currently leveraged 
> in Nutch. This is just one example of an improved way for us to add metadata 
> to Nutch documents.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to