On 18 April 2013 17:11, Julien Plu <[email protected]> wrote: >>You could also simply increase the number of retries (currently 3) or >>the maximum time (currently 4000 ms) in AbstractExtractor.scala. > > Ok, I think that I will change the maximum time value to see if I still have > the error. By the way my build failure is apparently due to Maven.
Oh, you are right. I thought the build failed because the abstract for that page could not be generated, but now I remembered the extraction job just logs that error and keeps going. Sorry, I was quite mistaken. Now I'm confused. I don't know what happened. You might want to run Maven with -e or -X to get more detailed error messages. JC > > Best. > > Julien. > > > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]> >> >> On 18 April 2013 16:08, Julien Plu <[email protected]> >> wrote: >> > Hi Jona, >> > >> > The API respond correctly :-( I think at least that the >> > "SocketTimeoutException" occur because an abstract doesn't exist, no ? >> >> I don't think so. I think it usually happens because generating the >> abstract actually takes extremely long for some pages. I don't know >> why. Maybe they happen to use some very complex templates. On the >> other hand, I took a quick look at the source code of >> http://fr.wikipedia.org/wiki/Prix_Ken_Domon and didn't see anything >> suspicious. >> >> By the way, an abstract always exists, it may be empty though. The >> page content is not stored in the database, we send it to MediaWiki >> for each page. That's much faster. >> >> > (because this exception appeared many times during the extraction) but >> > it's >> > not blocking. >> >> Yes, many other article need a lot of time, but most of the time it >> works on the second or third try. >> >> You could also simply increase the number of retries (currently 3) or >> the maximum time (currently 4000 ms) in AbstractExtractor.scala. >> >> > >> > And I have some gunzip files into my dump directories with data inside >> > so >> > the extraction worked until the error occured. >> > >> > I rerun the extraction but with "extraction.default.properties" we will >> > see >> > if there is an improvement... >> > >> > And the machine is a virtual machine (from virtualbox) with 2Go of >> > memories >> > and 3 cores from my computer so it's normal if it's slow like that. But >> > I >> > will try it on another machine, a real server machine. >> > >> > Best. >> > >> > Julien. >> > >> > >> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]> >> >> >> >> Hi Julien, >> >> >> >> That sucks. 21 hours and then it crashes. That's a bummer. >> >> >> >> I don't know what's going on. You could try calling api.php from the >> >> command line using curl and see what happens. Maybe it actually takes >> >> extremely long to render that article. Calling api.php is a bit >> >> cumbersome though - I think you have to copy the wikitext for the >> >> article from the xml dump and construct a POST request. I may be >> >> simpler to hack together a little HTML page with a form for all the >> >> data you need (page title and content, I think) which POSTs the data >> >> to api.php. If you do that, let us know, I'd love to add such a test >> >> page to our MediaWiki files in the repo. >> >> >> >> @all - Is there a simpler way to test the abstract extraction for a >> >> single page? I don't remember. >> >> >> >> By the way, 21 hours for the French WIkipedia sounds pretty slow, if I >> >> recall correctly. How many ms per page does the log file say? What >> >> kind of machine do you have? I think on our reasonably but not >> >> extremely fast machine with four cores it took something like 30 ms >> >> per page. Are you sure you activated APC? That makes a huge >> >> difference. >> >> >> >> Good luck, >> >> JC >> >> >> >> On 18 April 2013 11:52, Julien Plu >> >> <[email protected]> >> >> wrote: >> >> > Hi, >> >> > >> >> > After around 21 hours of process the abstract extraction has been >> >> > stopped by >> >> > a "build failure" : >> >> > >> >> > avr. 18, 2013 10:33:44 AM >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1 >> >> > apply$mcVI$sp >> >> > INFO: Error retrieving abstract of title=Prix Ken >> >> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying... >> >> > java.net.SocketTimeoutException: Read timed out >> >> > at java.net.SocketInputStream.socketRead0(Native Method) >> >> > at java.net.SocketInputStream.read(SocketInputStream.java:150) >> >> > at java.net.SocketInputStream.read(SocketInputStream.java:121) >> >> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) >> >> > at >> >> > java.io.BufferedInputStream.read1(BufferedInputStream.java:275) >> >> > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) >> >> > at >> >> > sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633) >> >> > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579) >> >> > at >> >> > >> >> > >> >> > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124) >> >> > at >> >> > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13) >> >> > at >> >> > >> >> > >> >> > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239) >> >> > at >> >> > >> >> > >> >> > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239) >> >> > at >> >> > >> >> > >> >> > scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) >> >> > at scala.collection.immutable.List.foreach(List.scala:76) >> >> > at >> >> > >> >> > >> >> > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239) >> >> > at scala.collection.immutable.List.flatMap(List.scala:76) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23) >> >> > at >> >> > >> >> > >> >> > org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131) >> >> > >> >> > [INFO] >> >> > >> >> > ------------------------------------------------------------------------ >> >> > [INFO] BUILD FAILURE >> >> > [INFO] >> >> > >> >> > ------------------------------------------------------------------------ >> >> > [INFO] Total time: 21:33:55.973s >> >> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013 >> >> > [INFO] Final Memory: 10M/147M >> >> > [INFO] >> >> > >> >> > ------------------------------------------------------------------------ >> >> > [ERROR] Failed to execute goal >> >> > org.scala-tools:maven-scala-plugin:2.15.2:run >> >> > (default-cli) on project dump: wrap: >> >> > org.apache.commons.exec.ExecuteException: Process exited with an >> >> > error: >> >> > 137(Exit value: 137) -> [Help 1] >> >> > [ERROR] >> >> > [ERROR] To see the full stack trace of the errors, re-run Maven with >> >> > the >> >> > -e >> >> > switch. >> >> > [ERROR] Re-run Maven using the -X switch to enable full debug >> >> > logging. >> >> > [ERROR] >> >> > [ERROR] For more information about the errors and possible solutions, >> >> > please >> >> > read the following articles: >> >> > [ERROR] [Help 1] >> >> > >> >> > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException >> >> > >> >> > Someone know why this error happened ? Not enough memory ? >> >> > >> >> > Best. >> >> > >> >> > Julien. >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Precog is a next-generation analytics platform capable of advanced >> >> > analytics on semi-structured data. The platform includes APIs for >> >> > building >> >> > apps and a phenomenal toolset for data science. Developers can use >> >> > our toolset for easy data analysis & visualization. Get a free >> >> > account! >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter >> >> > _______________________________________________ >> >> > Dbpedia-discussion mailing list >> >> > [email protected] >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> > >> > >> > > > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
