Hi Julien,

On Tue, Apr 17, 2012 at 12:03 PM, Julien Nioche <
[email protected]> wrote:

> why should they? If they are not used outside the class then I think it is
> good practice to keep them private. Since TikaParser implements Parser the
> only method that we can expect to be called is getParse() and it is public
>

Yeah oops ;0) (duh)


>
> it could have one for testing but IMHO using the ParserChecker is a better
> way of testing as it is closer to real use
>

Yeah +1 it really is.


>  I don't remember this but I remember suggesting that the Any23 parser
> should be a tika parser which is not the same as a Tika wrapper. I expect
> other people in Tika-land to have a use for it, and we'd get the benefit of
> it automatically with parse-tika
>

Yeah I think for Any23 this would be a great goal to work towards, however
not within the scope of a parse-any23 plugin for Nutch :0) Time will tell
for this one. I think we are all getting to know the capabilities of Any23
just now so it's still early days.


>
> Depends on what you want to do? What would we get out of Any23? How would
> that be used on the search side?
>

I need to look into this, having spoken with Paolo Castagna about this
before we discussed a TDB implementation enabling us to scrape structured
Any23 stuff and send it directly to TDB, however this is separate from a
neat indeaxing filter(s) which for example embraces the mimeType and
indexes it accordingly. The reason the feed plugin grabbed my attention was
that the FeedIndexingFilter grabs important info from the feed and passing
it in such a way that we can index and search convenienctly and efficiently
through piles of feeds. With the latter part of this I need to do some
investigation RE different formats and how they can be represented within
an index allowing us to conveniently navigate triples etc.

Thanks for now Julien.

Lewis
-- 
*Lewis*

Reply via email to