Re: [Dbpedia-discussion] Build Failure during abstract extraction

Jona Christopher Sahnwaldt Thu, 18 Apr 2013 09:17:56 -0700

On 18 April 2013 17:11, Julien Plu <[email protected]> wrote:
>>You could also simply increase the number of retries (currently 3) or
>>the maximum time (currently 4000 ms) in AbstractExtractor.scala.
>
> Ok, I think that I will change the maximum time value to see if I still have
> the error. By the way my build failure is apparently due to Maven.


Oh, you are right. I thought the build failed because the abstract for
that page could not be generated, but now I remembered the extraction
job just logs that error and keeps going. Sorry, I was quite mistaken.

Now I'm confused. I don't know what happened. You might want to run
Maven with -e or -X to get more detailed error messages.

JC

>
> Best.
>
> Julien.
>
>
> 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
>>
>> On 18 April 2013 16:08, Julien Plu <[email protected]>
>> wrote:
>> > Hi Jona,
>> >
>> > The API respond correctly :-( I think at least that the
>> > "SocketTimeoutException" occur because an abstract doesn't exist, no ?
>>
>> I don't think so. I think it usually happens because generating the
>> abstract actually takes extremely long for some pages. I don't know
>> why. Maybe they happen to use some very complex templates. On the
>> other hand, I took a quick look at the source code of
>> http://fr.wikipedia.org/wiki/Prix_Ken_Domon and didn't see anything
>> suspicious.
>>
>> By the way, an abstract always exists, it may be empty though. The
>> page content is not stored in the database, we send it to MediaWiki
>> for each page. That's much faster.
>>
>> > (because this exception appeared many times during the extraction) but
>> > it's
>> > not blocking.
>>
>> Yes, many other article need a lot of time, but most of the time it
>> works on the second or third try.
>>
>> You could also simply increase the number of retries (currently 3) or
>> the maximum time (currently 4000 ms) in AbstractExtractor.scala.
>>
>> >
>> > And I have some gunzip files into my dump directories with data inside
>> > so
>> > the extraction worked until the error occured.
>> >
>> > I rerun the extraction but with "extraction.default.properties" we will
>> > see
>> > if there is an improvement...
>> >
>> > And the machine is a virtual machine (from virtualbox) with 2Go of
>> > memories
>> > and 3 cores from my computer so it's normal if it's slow like that. But
>> > I
>> > will try it on another machine, a real server machine.
>> >
>> > Best.
>> >
>> > Julien.
>> >
>> >
>> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
>> >>
>> >> Hi Julien,
>> >>
>> >> That sucks. 21 hours and then it crashes. That's a bummer.
>> >>
>> >> I don't know what's going on. You could try calling api.php from the
>> >> command line using curl and see what happens. Maybe it actually takes
>> >> extremely long to render that article. Calling api.php is a bit
>> >> cumbersome though - I think you have to copy the wikitext for the
>> >> article from the xml dump and construct a POST request. I may be
>> >> simpler to hack together a little HTML page with a form for all the
>> >> data you need (page title and content, I think) which POSTs the data
>> >> to api.php. If you do that, let us know, I'd love to add such a test
>> >> page to our MediaWiki files in the repo.
>> >>
>> >> @all - Is there a simpler way to test the abstract extraction for a
>> >> single page? I don't remember.
>> >>
>> >> By the way, 21 hours for the French WIkipedia sounds pretty slow, if I
>> >> recall correctly. How many ms per page does the log file say? What
>> >> kind of machine do you have? I think on our reasonably but not
>> >> extremely fast machine with four cores it took something like 30 ms
>> >> per page. Are you sure you activated APC? That makes a huge
>> >> difference.
>> >>
>> >> Good luck,
>> >> JC
>> >>
>> >> On 18 April 2013 11:52, Julien Plu
>> >> <[email protected]>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > After around 21 hours of process the abstract extraction has been
>> >> > stopped by
>> >> > a "build failure" :
>> >> >
>> >> > avr. 18, 2013 10:33:44 AM
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
>> >> > apply$mcVI$sp
>> >> > INFO: Error retrieving abstract of title=Prix Ken
>> >> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
>> >> > java.net.SocketTimeoutException: Read timed out
>> >> >     at java.net.SocketInputStream.socketRead0(Native Method)
>> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:150)
>> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:121)
>> >> >     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>> >> >     at
>> >> > java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>> >> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>> >> >     at
>> >> > sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
>> >> >     at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
>> >> >     at
>> >> >
>> >> >
>> >> > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
>> >> >     at
>> >> > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>> >> >     at
>> >> >
>> >> >
>> >> > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>> >> >     at
>> >> >
>> >> >
>> >> > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>> >> >     at
>> >> >
>> >> >
>> >> > scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>> >> >     at scala.collection.immutable.List.foreach(List.scala:76)
>> >> >     at
>> >> >
>> >> >
>> >> > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
>> >> >     at scala.collection.immutable.List.flatMap(List.scala:76)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
>> >> >     at
>> >> >
>> >> >
>> >> > org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
>> >> >
>> >> > [INFO]
>> >> >
>> >> > ------------------------------------------------------------------------
>> >> > [INFO] BUILD FAILURE
>> >> > [INFO]
>> >> >
>> >> > ------------------------------------------------------------------------
>> >> > [INFO] Total time: 21:33:55.973s
>> >> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
>> >> > [INFO] Final Memory: 10M/147M
>> >> > [INFO]
>> >> >
>> >> > ------------------------------------------------------------------------
>> >> > [ERROR] Failed to execute goal
>> >> > org.scala-tools:maven-scala-plugin:2.15.2:run
>> >> > (default-cli) on project dump: wrap:
>> >> > org.apache.commons.exec.ExecuteException: Process exited with an
>> >> > error:
>> >> > 137(Exit value: 137) -> [Help 1]
>> >> > [ERROR]
>> >> > [ERROR] To see the full stack trace of the errors, re-run Maven with
>> >> > the
>> >> > -e
>> >> > switch.
>> >> > [ERROR] Re-run Maven using the -X switch to enable full debug
>> >> > logging.
>> >> > [ERROR]
>> >> > [ERROR] For more information about the errors and possible solutions,
>> >> > please
>> >> > read the following articles:
>> >> > [ERROR] [Help 1]
>> >> >
>> >> > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>> >> >
>> >> > Someone know why this error happened ? Not enough memory ?
>> >> >
>> >> > Best.
>> >> >
>> >> > Julien.
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Precog is a next-generation analytics platform capable of advanced
>> >> > analytics on semi-structured data. The platform includes APIs for
>> >> > building
>> >> > apps and a phenomenal toolset for data science. Developers can use
>> >> > our toolset for easy data analysis & visualization. Get a free
>> >> > account!
>> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
>> >> > _______________________________________________
>> >> > Dbpedia-discussion mailing list
>> >> > [email protected]
>> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >> >
>> >
>> >
>
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Build Failure during abstract extraction

Reply via email to