Hi Chris, On Tue, Apr 17, 2012 at 3:20 PM, Mattmann, Chris A (388J) < [email protected]> wrote:
> > I think it would be super awesome to add the Any23 parsing functionality > as a Tika parser, and potentially > an extension to the MIME repository to detect microformats, etc. Then in > Nutch, we could take advantage of > the any23 parser with the existing tika-parser interface. > > Thoughts? > Well on top of what I've managed to ramble on elsewhere, I think that this utopian vision is something I will definitely be pushing for. It makes perfect sense but I think it's a case of Any23 maturing within the incubator before we can push it up to the Tika PMC for this stuff. It would be a win win situation as Nutch would benefit also. For the time being I think a step back (Tika wrapper) plugin implementing the Any23 functionality would be a good start. I'll be making headways on this over the next while so will keep all you guys up to date with it. Thanks Lewis

