Re: [Dbpedia-discussion] Build Failure during abstract extraction

Julien Plu Thu, 18 Apr 2013 08:12:59 -0700

>You could also simply increase the number of retries (currently 3) or
>the maximum time (currently 4000 ms) in AbstractExtractor.scala.


Ok, I think that I will change the maximum time value to see if I still
have the error. By the way my build failure is apparently due to Maven.

Best.

Julien.


2013/4/18 Jona Christopher Sahnwaldt <[email protected]>

> On 18 April 2013 16:08, Julien Plu <[email protected]>
> wrote:
> > Hi Jona,
> >
> > The API respond correctly :-( I think at least that the
> > "SocketTimeoutException" occur because an abstract doesn't exist, no ?
>
> I don't think so. I think it usually happens because generating the
> abstract actually takes extremely long for some pages. I don't know
> why. Maybe they happen to use some very complex templates. On the
> other hand, I took a quick look at the source code of
> http://fr.wikipedia.org/wiki/Prix_Ken_Domon and didn't see anything
> suspicious.
>
> By the way, an abstract always exists, it may be empty though. The
> page content is not stored in the database, we send it to MediaWiki
> for each page. That's much faster.
>
> > (because this exception appeared many times during the extraction) but
> it's
> > not blocking.
>
> Yes, many other article need a lot of time, but most of the time it
> works on the second or third try.
>
> You could also simply increase the number of retries (currently 3) or
> the maximum time (currently 4000 ms) in AbstractExtractor.scala.
>
> >
> > And I have some gunzip files into my dump directories with data inside so
> > the extraction worked until the error occured.
> >
> > I rerun the extraction but with "extraction.default.properties" we will
> see
> > if there is an improvement...
> >
> > And the machine is a virtual machine (from virtualbox) with 2Go of
> memories
> > and 3 cores from my computer so it's normal if it's slow like that. But I
> > will try it on another machine, a real server machine.
> >
> > Best.
> >
> > Julien.
> >
> >
> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
> >>
> >> Hi Julien,
> >>
> >> That sucks. 21 hours and then it crashes. That's a bummer.
> >>
> >> I don't know what's going on. You could try calling api.php from the
> >> command line using curl and see what happens. Maybe it actually takes
> >> extremely long to render that article. Calling api.php is a bit
> >> cumbersome though - I think you have to copy the wikitext for the
> >> article from the xml dump and construct a POST request. I may be
> >> simpler to hack together a little HTML page with a form for all the
> >> data you need (page title and content, I think) which POSTs the data
> >> to api.php. If you do that, let us know, I'd love to add such a test
> >> page to our MediaWiki files in the repo.
> >>
> >> @all - Is there a simpler way to test the abstract extraction for a
> >> single page? I don't remember.
> >>
> >> By the way, 21 hours for the French WIkipedia sounds pretty slow, if I
> >> recall correctly. How many ms per page does the log file say? What
> >> kind of machine do you have? I think on our reasonably but not
> >> extremely fast machine with four cores it took something like 30 ms
> >> per page. Are you sure you activated APC? That makes a huge
> >> difference.
> >>
> >> Good luck,
> >> JC
> >>
> >> On 18 April 2013 11:52, Julien Plu <[email protected]
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > After around 21 hours of process the abstract extraction has been
> >> > stopped by
> >> > a "build failure" :
> >> >
> >> > avr. 18, 2013 10:33:44 AM
> >> >
> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
> >> > apply$mcVI$sp
> >> > INFO: Error retrieving abstract of title=Prix Ken
> >> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
> >> > java.net.SocketTimeoutException: Read timed out
> >> >     at java.net.SocketInputStream.socketRead0(Native Method)
> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:150)
> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:121)
> >> >     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >> >     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> >> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >> >     at
> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
> >> >     at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
> >> >     at
> >> >
> >> >
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
> >> >     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >> >     at
> >> >
> >> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >> >     at
> >> >
> >> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >> >     at
> >> >
> >> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> >> >     at scala.collection.immutable.List.foreach(List.scala:76)
> >> >     at
> >> >
> >> >
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
> >> >     at scala.collection.immutable.List.flatMap(List.scala:76)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
> >> >     at
> >> >
> >> >
> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
> >> >
> >> > [INFO]
> >> >
> ------------------------------------------------------------------------
> >> > [INFO] BUILD FAILURE
> >> > [INFO]
> >> >
> ------------------------------------------------------------------------
> >> > [INFO] Total time: 21:33:55.973s
> >> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
> >> > [INFO] Final Memory: 10M/147M
> >> > [INFO]
> >> >
> ------------------------------------------------------------------------
> >> > [ERROR] Failed to execute goal
> >> > org.scala-tools:maven-scala-plugin:2.15.2:run
> >> > (default-cli) on project dump: wrap:
> >> > org.apache.commons.exec.ExecuteException: Process exited with an
> error:
> >> > 137(Exit value: 137) -> [Help 1]
> >> > [ERROR]
> >> > [ERROR] To see the full stack trace of the errors, re-run Maven with
> the
> >> > -e
> >> > switch.
> >> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> >> > [ERROR]
> >> > [ERROR] For more information about the errors and possible solutions,
> >> > please
> >> > read the following articles:
> >> > [ERROR] [Help 1]
> >> >
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> >> >
> >> > Someone know why this error happened ? Not enough memory ?
> >> >
> >> > Best.
> >> >
> >> > Julien.
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Precog is a next-generation analytics platform capable of advanced
> >> > analytics on semi-structured data. The platform includes APIs for
> >> > building
> >> > apps and a phenomenal toolset for data science. Developers can use
> >> > our toolset for easy data analysis & visualization. Get a free
> account!
> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
> >> > _______________________________________________
> >> > Dbpedia-discussion mailing list
> >> > [email protected]
> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >> >
> >
> >
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Build Failure during abstract extraction

Reply via email to