Hi Chirs, Sorry for late answer I couldnt write I was sick last week. I have checked links. If I wanna do the job, I must use them and I will. On the other hand, I need a mentor for gsoc project. Would you consider being my mentor?
On Sat, Mar 28, 2015 at 4:53 AM, Mattmann, Chris A (3980) < [email protected]> wrote: > Hi Remiz, > > Sure! > > Check out this 5 min writing a parser guide in Tika: > > https://tika.apache.org/1.7/parser_guide.html > > > OK, so then check out Any23: > > http://any23.apache.org/ > > It has support for parsing RDF Microformats. So, you > may want to create a MicroformatsParser in Tika; then > if it’s supported in Tika, it will in turn be available > in Nutch and its parse-tika plugin if you upgrade it to > the latest version of Tika. > > You can see how to do this here: > > http://s.apache.org/fsY > > Cheers and best of luck - hope that’s enough to get > your proposal kicked off. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Remzi Düzağaç <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Friday, March 27, 2015 at 7:22 AM > To: dev <[email protected]> > Cc: "[email protected]" <[email protected]>, "[email protected]" > <[email protected]> > Subject: Re: GSOC RDF Microformats Support > > >Hi Chris, > > > > > >Thanks for your feedback. > >I was planning to use any23 and tika but I dont have detailed grasp of > >both projects. I guess Im gonna need to dive in both. > > > > > >I would appreciate if you could guide me > > > > > >thanks > > > >On Fri, Mar 27, 2015 at 4:07 PM, Mattmann, Chris A (3980) > ><[email protected]> wrote: > > > >Hi Remzi - thanks! You may want to consider this as a Tika or > >Any23 project since Nutch delegates its parsing to Tika (and > >Any23 uses Tika [and vice versa] to handle micro formats). > > > >Cheers, > >Chris > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Chris Mattmann, Ph.D. > >Chief Architect > >Instrument Software and Science Data Systems Section (398) > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >Office: 168-519, Mailstop: 168-527 > >Email: [email protected] > >WWW: http://sunset.usc.edu/~mattmann/ > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Adjunct Associate Professor, Computer Science Department > >University of Southern California, Los Angeles, CA 90089 USA > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > > >-----Original Message----- > >From: Remzi Düzağaç <[email protected]> > >Reply-To: "[email protected]" <[email protected]> > >Date: Friday, March 27, 2015 at 5:07 AM > >To: "[email protected]" <[email protected]> > >Subject: GSOC RDF Microformats Support > > > >>Hi Guys, > >> > >> > >>I have sent a proposal to gsoc. I would like to add rdf microformat > >>support to nutch. I kindly ask for your support. Is there anyone > >>volunteer to be my mentor on this topic? > >> > >> > >>Thank you very much > >> > > > > > > > > > > > > > > > > > > > > > > > >
