Thanks Tejas for the information.

Did you try deleting 'crawl_parse' directory ? Since, the code checks for
that directory, i will try deleting and reparsing.



On Mon, Mar 4, 2013 at 10:49 PM, Tejas Patil <[email protected]>wrote:

> The code [0] checks if there is already a "crawl_parse" directory in the
> segment [lines 88-89].
>
>  88 if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME))) 89 throw new
> IOException("Segment already parsed!");
> I am not sure what you guys meant by deleting the subsection of the
> directories. Did you mean deletion of the contents inside the old
> crawl_parse directory ? I tried that locally and it didn't work.
>
> [0] :
>
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=markup
>
>
> On Mon, Mar 4, 2013 at 4:20 PM, kiran chitturi <[email protected]
> >wrote:
>
> > It took me close to 2 days to fetch 400k pages on my not so fast single
> > machine. I do not want to refetch unless it very crucial.
> >
> > I will check and see if deleting any sub-directories is helpful
> >
> > Thanks!
> >
> >
> > On Mon, Mar 4, 2013 at 5:54 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > This makes perfect sense Kiran. It is something I've encountered in the
> > > past and as my segments were not production critical I was easily able
> to
> > > delete and re-fetch them then parse out the stuff I wanted to.
> > > As I said, I think this is the only way to get I'm afraid.
> > >
> > > On Mon, Mar 4, 2013 at 2:25 PM, kiran chitturi <
> > [email protected]
> > > >wrote:
> > >
> > > > Yeah. I used parse-(tika|metatags) first in the configuration and
> now i
> > > > want to use parse-(html|tika|metatags). This is due to the
> > parse-metatags
> > > > new patch upgrade.
> > > >
> > > > Thanks for the suggestions. It would be very helpful for reparsing
> > > segments
> > > > for 1.x like 2.x has.
> > > >
> > > > Regards,
> > > > Kiran.
> > > >
> > > >
> > > > On Mon, Mar 4, 2013 at 4:51 PM, Lewis John Mcgibbney <
> > > > [email protected]> wrote:
> > > >
> > > > > Please don't go ahead and delete the parse directories just yet
> > before
> > > > you
> > > > > hear back from others.
> > > > > My suggestion would be to try and delete a subsection of the
> > > directories
> > > > > and see if this is possible.
> > > > > Have you changed some configuration and now want to parse out some
> > more
> > > > > content/structure?
> > > > >
> > > > >
> > > > > On Mon, Mar 4, 2013 at 1:33 PM, kiran chitturi <
> > > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Hi!
> > > > > >
> > > > > > I am trying to reparse Nutch segments and it says 'Segment
> already
> > > > > parsed'
> > > > > > when i try to parse.
> > > > > >
> > > > > > Is there any option of attribute as '-reparse' like 2.x series
> has
> > ?
> > > > > >
> > > > > > Should i delete some directories so that i can reparse ?
> > > > > >
> > > > > > Please give me suggestions on how to reparse segments that are
> > > already
> > > > > > parsed.
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > Kiran Chitturi
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Lewis*
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Kiran Chitturi
> > > >
> > >
> > >
> > >
> > > --
> > > *Lewis*
> > >
> >
> >
> >
> > --
> > Kiran Chitturi
> >
>



-- 
Kiran Chitturi

Reply via email to