I rerun the extraction with extraction.default.properties I will see
tomorrow if an error appear or not. I will let you know.

Best.

Julien.


2013/4/18 Jona Christopher Sahnwaldt <[email protected]>

> On 18 April 2013 17:11, Julien Plu <[email protected]>
> wrote:
> >>You could also simply increase the number of retries (currently 3) or
> >>the maximum time (currently 4000 ms) in AbstractExtractor.scala.
> >
> > Ok, I think that I will change the maximum time value to see if I still
> have
> > the error. By the way my build failure is apparently due to Maven.
>
> Oh, you are right. I thought the build failed because the abstract for
> that page could not be generated, but now I remembered the extraction
> job just logs that error and keeps going. Sorry, I was quite mistaken.
>
> Now I'm confused. I don't know what happened. You might want to run
> Maven with -e or -X to get more detailed error messages.
>
> JC
>
> >
> > Best.
> >
> > Julien.
> >
> >
> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
> >>
> >> On 18 April 2013 16:08, Julien Plu <[email protected]
> >
> >> wrote:
> >> > Hi Jona,
> >> >
> >> > The API respond correctly :-( I think at least that the
> >> > "SocketTimeoutException" occur because an abstract doesn't exist, no ?
> >>
> >> I don't think so. I think it usually happens because generating the
> >> abstract actually takes extremely long for some pages. I don't know
> >> why. Maybe they happen to use some very complex templates. On the
> >> other hand, I took a quick look at the source code of
> >> http://fr.wikipedia.org/wiki/Prix_Ken_Domon and didn't see anything
> >> suspicious.
> >>
> >> By the way, an abstract always exists, it may be empty though. The
> >> page content is not stored in the database, we send it to MediaWiki
> >> for each page. That's much faster.
> >>
> >> > (because this exception appeared many times during the extraction) but
> >> > it's
> >> > not blocking.
> >>
> >> Yes, many other article need a lot of time, but most of the time it
> >> works on the second or third try.
> >>
> >> You could also simply increase the number of retries (currently 3) or
> >> the maximum time (currently 4000 ms) in AbstractExtractor.scala.
> >>
> >> >
> >> > And I have some gunzip files into my dump directories with data inside
> >> > so
> >> > the extraction worked until the error occured.
> >> >
> >> > I rerun the extraction but with "extraction.default.properties" we
> will
> >> > see
> >> > if there is an improvement...
> >> >
> >> > And the machine is a virtual machine (from virtualbox) with 2Go of
> >> > memories
> >> > and 3 cores from my computer so it's normal if it's slow like that.
> But
> >> > I
> >> > will try it on another machine, a real server machine.
> >> >
> >> > Best.
> >> >
> >> > Julien.
> >> >
> >> >
> >> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
> >> >>
> >> >> Hi Julien,
> >> >>
> >> >> That sucks. 21 hours and then it crashes. That's a bummer.
> >> >>
> >> >> I don't know what's going on. You could try calling api.php from the
> >> >> command line using curl and see what happens. Maybe it actually takes
> >> >> extremely long to render that article. Calling api.php is a bit
> >> >> cumbersome though - I think you have to copy the wikitext for the
> >> >> article from the xml dump and construct a POST request. I may be
> >> >> simpler to hack together a little HTML page with a form for all the
> >> >> data you need (page title and content, I think) which POSTs the data
> >> >> to api.php. If you do that, let us know, I'd love to add such a test
> >> >> page to our MediaWiki files in the repo.
> >> >>
> >> >> @all - Is there a simpler way to test the abstract extraction for a
> >> >> single page? I don't remember.
> >> >>
> >> >> By the way, 21 hours for the French WIkipedia sounds pretty slow, if
> I
> >> >> recall correctly. How many ms per page does the log file say? What
> >> >> kind of machine do you have? I think on our reasonably but not
> >> >> extremely fast machine with four cores it took something like 30 ms
> >> >> per page. Are you sure you activated APC? That makes a huge
> >> >> difference.
> >> >>
> >> >> Good luck,
> >> >> JC
> >> >>
> >> >> On 18 April 2013 11:52, Julien Plu
> >> >> <[email protected]>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > After around 21 hours of process the abstract extraction has been
> >> >> > stopped by
> >> >> > a "build failure" :
> >> >> >
> >> >> > avr. 18, 2013 10:33:44 AM
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
> >> >> > apply$mcVI$sp
> >> >> > INFO: Error retrieving abstract of title=Prix Ken
> >> >> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
> >> >> > java.net.SocketTimeoutException: Read timed out
> >> >> >     at java.net.SocketInputStream.socketRead0(Native Method)
> >> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:150)
> >> >> >     at java.net.SocketInputStream.read(SocketInputStream.java:121)
> >> >> >     at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >> >> >     at
> >> >> > java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> >> >> >     at
> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >> >> >     at
> >> >> > sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
> >> >> >     at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
> >> >> >     at
> >> >> > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> >> >> >     at scala.collection.immutable.List.foreach(List.scala:76)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
> >> >> >     at scala.collection.immutable.List.flatMap(List.scala:76)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
> >> >> >     at
> >> >> >
> >> >> >
> >> >> >
> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
> >> >> >
> >> >> > [INFO]
> >> >> >
> >> >> >
> ------------------------------------------------------------------------
> >> >> > [INFO] BUILD FAILURE
> >> >> > [INFO]
> >> >> >
> >> >> >
> ------------------------------------------------------------------------
> >> >> > [INFO] Total time: 21:33:55.973s
> >> >> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
> >> >> > [INFO] Final Memory: 10M/147M
> >> >> > [INFO]
> >> >> >
> >> >> >
> ------------------------------------------------------------------------
> >> >> > [ERROR] Failed to execute goal
> >> >> > org.scala-tools:maven-scala-plugin:2.15.2:run
> >> >> > (default-cli) on project dump: wrap:
> >> >> > org.apache.commons.exec.ExecuteException: Process exited with an
> >> >> > error:
> >> >> > 137(Exit value: 137) -> [Help 1]
> >> >> > [ERROR]
> >> >> > [ERROR] To see the full stack trace of the errors, re-run Maven
> with
> >> >> > the
> >> >> > -e
> >> >> > switch.
> >> >> > [ERROR] Re-run Maven using the -X switch to enable full debug
> >> >> > logging.
> >> >> > [ERROR]
> >> >> > [ERROR] For more information about the errors and possible
> solutions,
> >> >> > please
> >> >> > read the following articles:
> >> >> > [ERROR] [Help 1]
> >> >> >
> >> >> >
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> >> >> >
> >> >> > Someone know why this error happened ? Not enough memory ?
> >> >> >
> >> >> > Best.
> >> >> >
> >> >> > Julien.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> ------------------------------------------------------------------------------
> >> >> > Precog is a next-generation analytics platform capable of advanced
> >> >> > analytics on semi-structured data. The platform includes APIs for
> >> >> > building
> >> >> > apps and a phenomenal toolset for data science. Developers can use
> >> >> > our toolset for easy data analysis & visualization. Get a free
> >> >> > account!
> >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
> >> >> > _______________________________________________
> >> >> > Dbpedia-discussion mailing list
> >> >> > [email protected]
> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >> >> >
> >> >
> >> >
> >
> >
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to