Don't hold your breath :( failing at Count: 832000 David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable & embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!!
2009/11/20 Tilman Baumann <til...@baumann.name>: > > David Reyes Samblas Martinez wrote: >> Well spanish one give me the same error before but now it works, > Any idea what solved it? Or is it just random and will go away if I try it > again? :) > >> I'm parsing the de wikipedia right now (Count: 173000) lets see whats >> happens :) > > I would definitely be interessted in the results... > >> Note:Parsing the 2009-Nov-11 >> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2 >> >> Regards >> >> David Reyes Samblas Martinez >> http://www.tuxbrain.com >> Open ultraportable & embedded solutions >> Openmoko, Openpandora, Arduino >> Hey, watch out!!! There's a linux in your pocket!!! >> >> >> >> >> 2009/11/20 Tilman Baumann <til...@baumann.name>: >>> Can you reproduce this with a neutral locale? >>> export LC_ALL=C >>> >>> I'm at the moment trying the same. I had a lot of hickups, caused by >>> many >>> things. Among them missing tools and not enough memory. >>> >>> This is currently where I'm stuck with the German wikipedia. >>> >>> Count: 823000 >>> Count: 824000 >>> Count: 825000 >>> Count: 826000 >>> Count: 827000 >>> Count: 828000 >>> Count: 829000 >>> Count: 830000 >>> Count: 831000 >>> Count: 832000 >>> Count: 833000 >>> Traceback (most recent call last): >>> File "./ArticleParser.py", line 203, in <module> >>> main() >>> File "./ArticleParser.py", line 168, in main >>> process_article_text(title.encode('utf-8'), f.read(length), newf) >>> File "./ArticleParser.py", line 197, in process_article_text >>> newf.write(text + '\n') >>> IOError: [Errno 32] Broken pipe >>> make[1]: *** [parse] Error 1 >>> make[1]: Leaving directory >>> `/home/tilli/wikireader/host-tools/offline-renderer' >>> make: *** [parse] Error 2 >>> >>> I suppose it failed somewhere in PARSER_COMMAND >>> >>> >>> Before that, the following steps went through without fail. >>> make >>> make DESTDIR=image WORKDIR=work >>> XML_FILES=dewiki-20091028-pages-articles.xml index >>> >>> >>> David Reyes Samblas Martinez wrote: >>>> After the "success" of the spanish wikipedia pending to resolve the >>>> indexing part, I was starting to work on the german wikipedia >>>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2 >>>> >>>> but it fails at first step with the following error >>>> >>>> #make DESTDIR=image WORKDIR=work >>>> XML_FILES=dewiki-latest-pages-meta-current.xml index parse render >>>> combine >>>> >>>> awk: línea ord.:1: fatal: no se puede abrir el fichero >>>> `work/counts.text' para lectura (No existe el fichero ó directorio) >>>> cd host-tools/offline-renderer && make index \ >>>> >>>> XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml" >>>> RENDER_BLOCK="0" \ >>>> >>>> WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work" >>>> DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image" >>>> make[1]: se ingresa al directorio >>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' >>>> ./ArticleIndex.py \ >>>> >>>> --article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db" >>>> \ >>>> >>>> --article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db" >>>> \ >>>> >>>> --article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text" >>>> \ >>>> >>>> --prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia" >>>> /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml >>>> Traceback (most recent call last): >>>> File "./ArticleIndex.py", line 611, in <module> >>>> main() >>>> File "./ArticleIndex.py", line 172, in main >>>> limit = processor.process(f, limit) >>>> File >>>> "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py", >>>> line 141, in process >>>> if '#' == body[0] and 'redirect' == body[1:9].lower(): >>>> IndexError: string index out of range >>>> Flushing databases >>>> Writing: files >>>> Time: 0s >>>> Writing: articles >>>> Time: 0s >>>> Writing: offsets >>>> Time: 0s >>>> Loading: articles >>>> Time: 0s >>>> Loading: offsets and files >>>> Time: 0s >>>> make[1]: *** [index] Error 1 >>>> make[1]: se sale del directorio >>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' >>>> make: *** [index] Error 2 >>>> >>>> Regards >>>> >>>> David Reyes Samblas Martinez >>>> http://www.tuxbrain.com >>>> Open ultraportable & embedded solutions >>>> Openmoko, Openpandora, Arduino >>>> Hey, watch out!!! There's a linux in your pocket!!! >>>> >>>> _______________________________________________ >>>> Openmoko community mailing list >>>> community@lists.openmoko.org >>>> http://lists.openmoko.org/mailman/listinfo/community >>>> >>> >>> >>> -- >>> >>> >>> >>> _______________________________________________ >>> Openmoko community mailing list >>> community@lists.openmoko.org >>> http://lists.openmoko.org/mailman/listinfo/community >>> >> >> _______________________________________________ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> > > > -- > > > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community