Hi,
It worked perfectly on a really most powerfull machine than my VM in only 4
hours. I just increased a little bit the maximum time.
Best.
Julien
2013/4/18 Julien Plu <[email protected]>
> I rerun the extraction with extraction.default.properties I will see
> tomorrow if an error appear or not. I will let you know.
>
> Best.
>
> Julien.
>
>
> 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
>
>> On 18 April 2013 17:11, Julien Plu <[email protected]>
>> wrote:
>> >>You could also simply increase the number of retries (currently 3) or
>> >>the maximum time (currently 4000 ms) in AbstractExtractor.scala.
>> >
>> > Ok, I think that I will change the maximum time value to see if I still
>> have
>> > the error. By the way my build failure is apparently due to Maven.
>>
>> Oh, you are right. I thought the build failed because the abstract for
>> that page could not be generated, but now I remembered the extraction
>> job just logs that error and keeps going. Sorry, I was quite mistaken.
>>
>> Now I'm confused. I don't know what happened. You might want to run
>> Maven with -e or -X to get more detailed error messages.
>>
>> JC
>>
>> >
>> > Best.
>> >
>> > Julien.
>> >
>> >
>> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
>> >>
>> >> On 18 April 2013 16:08, Julien Plu <
>> [email protected]>
>> >> wrote:
>> >> > Hi Jona,
>> >> >
>> >> > The API respond correctly :-( I think at least that the
>> >> > "SocketTimeoutException" occur because an abstract doesn't exist, no
>> ?
>> >>
>> >> I don't think so. I think it usually happens because generating the
>> >> abstract actually takes extremely long for some pages. I don't know
>> >> why. Maybe they happen to use some very complex templates. On the
>> >> other hand, I took a quick look at the source code of
>> >> http://fr.wikipedia.org/wiki/Prix_Ken_Domon and didn't see anything
>> >> suspicious.
>> >>
>> >> By the way, an abstract always exists, it may be empty though. The
>> >> page content is not stored in the database, we send it to MediaWiki
>> >> for each page. That's much faster.
>> >>
>> >> > (because this exception appeared many times during the extraction)
>> but
>> >> > it's
>> >> > not blocking.
>> >>
>> >> Yes, many other article need a lot of time, but most of the time it
>> >> works on the second or third try.
>> >>
>> >> You could also simply increase the number of retries (currently 3) or
>> >> the maximum time (currently 4000 ms) in AbstractExtractor.scala.
>> >>
>> >> >
>> >> > And I have some gunzip files into my dump directories with data
>> inside
>> >> > so
>> >> > the extraction worked until the error occured.
>> >> >
>> >> > I rerun the extraction but with "extraction.default.properties" we
>> will
>> >> > see
>> >> > if there is an improvement...
>> >> >
>> >> > And the machine is a virtual machine (from virtualbox) with 2Go of
>> >> > memories
>> >> > and 3 cores from my computer so it's normal if it's slow like that.
>> But
>> >> > I
>> >> > will try it on another machine, a real server machine.
>> >> >
>> >> > Best.
>> >> >
>> >> > Julien.
>> >> >
>> >> >
>> >> > 2013/4/18 Jona Christopher Sahnwaldt <[email protected]>
>> >> >>
>> >> >> Hi Julien,
>> >> >>
>> >> >> That sucks. 21 hours and then it crashes. That's a bummer.
>> >> >>
>> >> >> I don't know what's going on. You could try calling api.php from the
>> >> >> command line using curl and see what happens. Maybe it actually
>> takes
>> >> >> extremely long to render that article. Calling api.php is a bit
>> >> >> cumbersome though - I think you have to copy the wikitext for the
>> >> >> article from the xml dump and construct a POST request. I may be
>> >> >> simpler to hack together a little HTML page with a form for all the
>> >> >> data you need (page title and content, I think) which POSTs the data
>> >> >> to api.php. If you do that, let us know, I'd love to add such a test
>> >> >> page to our MediaWiki files in the repo.
>> >> >>
>> >> >> @all - Is there a simpler way to test the abstract extraction for a
>> >> >> single page? I don't remember.
>> >> >>
>> >> >> By the way, 21 hours for the French WIkipedia sounds pretty slow,
>> if I
>> >> >> recall correctly. How many ms per page does the log file say? What
>> >> >> kind of machine do you have? I think on our reasonably but not
>> >> >> extremely fast machine with four cores it took something like 30 ms
>> >> >> per page. Are you sure you activated APC? That makes a huge
>> >> >> difference.
>> >> >>
>> >> >> Good luck,
>> >> >> JC
>> >> >>
>> >> >> On 18 April 2013 11:52, Julien Plu
>> >> >> <[email protected]>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > After around 21 hours of process the abstract extraction has been
>> >> >> > stopped by
>> >> >> > a "build failure" :
>> >> >> >
>> >> >> > avr. 18, 2013 10:33:44 AM
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
>> >> >> > apply$mcVI$sp
>> >> >> > INFO: Error retrieving abstract of title=Prix Ken
>> >> >> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
>> >> >> > java.net.SocketTimeoutException: Read timed out
>> >> >> > at java.net.SocketInputStream.socketRead0(Native Method)
>> >> >> > at java.net.SocketInputStream.read(SocketInputStream.java:150)
>> >> >> > at java.net.SocketInputStream.read(SocketInputStream.java:121)
>> >> >> > at
>> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>> >> >> > at
>> >> >> > java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>> >> >> > at
>> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>> >> >> > at
>> >> >> > sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
>> >> >> > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
>> >> >> > at
>> >> >> > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>> >> >> > at scala.collection.immutable.List.foreach(List.scala:76)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
>> >> >> > at scala.collection.immutable.List.flatMap(List.scala:76)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
>> >> >> >
>> >> >> > [INFO]
>> >> >> >
>> >> >> >
>> ------------------------------------------------------------------------
>> >> >> > [INFO] BUILD FAILURE
>> >> >> > [INFO]
>> >> >> >
>> >> >> >
>> ------------------------------------------------------------------------
>> >> >> > [INFO] Total time: 21:33:55.973s
>> >> >> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
>> >> >> > [INFO] Final Memory: 10M/147M
>> >> >> > [INFO]
>> >> >> >
>> >> >> >
>> ------------------------------------------------------------------------
>> >> >> > [ERROR] Failed to execute goal
>> >> >> > org.scala-tools:maven-scala-plugin:2.15.2:run
>> >> >> > (default-cli) on project dump: wrap:
>> >> >> > org.apache.commons.exec.ExecuteException: Process exited with an
>> >> >> > error:
>> >> >> > 137(Exit value: 137) -> [Help 1]
>> >> >> > [ERROR]
>> >> >> > [ERROR] To see the full stack trace of the errors, re-run Maven
>> with
>> >> >> > the
>> >> >> > -e
>> >> >> > switch.
>> >> >> > [ERROR] Re-run Maven using the -X switch to enable full debug
>> >> >> > logging.
>> >> >> > [ERROR]
>> >> >> > [ERROR] For more information about the errors and possible
>> solutions,
>> >> >> > please
>> >> >> > read the following articles:
>> >> >> > [ERROR] [Help 1]
>> >> >> >
>> >> >> >
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>> >> >> >
>> >> >> > Someone know why this error happened ? Not enough memory ?
>> >> >> >
>> >> >> > Best.
>> >> >> >
>> >> >> > Julien.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> ------------------------------------------------------------------------------
>> >> >> > Precog is a next-generation analytics platform capable of advanced
>> >> >> > analytics on semi-structured data. The platform includes APIs for
>> >> >> > building
>> >> >> > apps and a phenomenal toolset for data science. Developers can use
>> >> >> > our toolset for easy data analysis & visualization. Get a free
>> >> >> > account!
>> >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
>> >> >> > _______________________________________________
>> >> >> > Dbpedia-discussion mailing list
>> >> >> > [email protected]
>> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion