Hi Markus,

I've been vaguely keeping up with yourself and Julien's work on this.

I would really like to get a test case for this though! I'll try working
towards this as a sub-target of another issue. For reference, there is a
Tika mimeType test case here [1] and Tika document encoding test here [2].
Which we may or may not be interested in porting over to o.a.n?

wdyt?

Thanks

Lewis

[1]
https://svn.apache.org/viewvc/incubator/any23/trunk/core/src/test/java/org/apache/any23/mime/TikaMIMETypeDetectorTest.java?view=markup
[2]
https://svn.apache.org/viewvc/incubator/any23/trunk/core/src/test/java/org/apache/any23/encoding/TikaEncodingDetectorTest.java?view=markup

On Tue, Feb 14, 2012 at 11:51 PM, Markus Jelsma <[email protected]> wrote:

> Hi,
>
> This was indeed an issue until today. The detected type is in the crawl
> datum
> metadata.
>
> https://issues.apache.org/jira/browse/NUTCH-1259
>
> > Hi,
> >
> > I can't see anywhere within our parser plugins where we detect encoding
> of
> > documents. I've also begun looking through the o.a.n.p package but again
> I
> > can't see anything.
> >
> > Can anyone provide some detail on this please?
> >
> > Thank you
> >
> > Lewis
>



-- 
*Lewis*

Reply via email to