Thanks Tejas for the information. Did you try deleting 'crawl_parse' directory ? Since, the code checks for that directory, i will try deleting and reparsing.
On Mon, Mar 4, 2013 at 10:49 PM, Tejas Patil <[email protected]>wrote: > The code [0] checks if there is already a "crawl_parse" directory in the > segment [lines 88-89]. > > 88 if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME))) 89 throw new > IOException("Segment already parsed!"); > I am not sure what you guys meant by deleting the subsection of the > directories. Did you mean deletion of the contents inside the old > crawl_parse directory ? I tried that locally and it didn't work. > > [0] : > > http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=markup > > > On Mon, Mar 4, 2013 at 4:20 PM, kiran chitturi <[email protected] > >wrote: > > > It took me close to 2 days to fetch 400k pages on my not so fast single > > machine. I do not want to refetch unless it very crucial. > > > > I will check and see if deleting any sub-directories is helpful > > > > Thanks! > > > > > > On Mon, Mar 4, 2013 at 5:54 PM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > > > This makes perfect sense Kiran. It is something I've encountered in the > > > past and as my segments were not production critical I was easily able > to > > > delete and re-fetch them then parse out the stuff I wanted to. > > > As I said, I think this is the only way to get I'm afraid. > > > > > > On Mon, Mar 4, 2013 at 2:25 PM, kiran chitturi < > > [email protected] > > > >wrote: > > > > > > > Yeah. I used parse-(tika|metatags) first in the configuration and > now i > > > > want to use parse-(html|tika|metatags). This is due to the > > parse-metatags > > > > new patch upgrade. > > > > > > > > Thanks for the suggestions. It would be very helpful for reparsing > > > segments > > > > for 1.x like 2.x has. > > > > > > > > Regards, > > > > Kiran. > > > > > > > > > > > > On Mon, Mar 4, 2013 at 4:51 PM, Lewis John Mcgibbney < > > > > [email protected]> wrote: > > > > > > > > > Please don't go ahead and delete the parse directories just yet > > before > > > > you > > > > > hear back from others. > > > > > My suggestion would be to try and delete a subsection of the > > > directories > > > > > and see if this is possible. > > > > > Have you changed some configuration and now want to parse out some > > more > > > > > content/structure? > > > > > > > > > > > > > > > On Mon, Mar 4, 2013 at 1:33 PM, kiran chitturi < > > > > [email protected] > > > > > >wrote: > > > > > > > > > > > Hi! > > > > > > > > > > > > I am trying to reparse Nutch segments and it says 'Segment > already > > > > > parsed' > > > > > > when i try to parse. > > > > > > > > > > > > Is there any option of attribute as '-reparse' like 2.x series > has > > ? > > > > > > > > > > > > Should i delete some directories so that i can reparse ? > > > > > > > > > > > > Please give me suggestions on how to reparse segments that are > > > already > > > > > > parsed. > > > > > > > > > > > > Thanks, > > > > > > -- > > > > > > Kiran Chitturi > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Lewis* > > > > > > > > > > > > > > > > > > > > > -- > > > > Kiran Chitturi > > > > > > > > > > > > > > > > -- > > > *Lewis* > > > > > > > > > > > -- > > Kiran Chitturi > > > -- Kiran Chitturi

