Re: [Dbpedia-gsoc] OutOfMemoryError When Running names_and_entities.pig

Max Jakob Mon, 22 Apr 2013 05:27:37 -0700

Hi Zhiwei,  [CCing dbp-spotlight-developers mailing list]

On Mon, Apr 22, 2013 at 1:30 PM, Cai Zhiwei <[email protected]> wrote:
> I tried to index English data set with pignlproc but got stuck on this step
> for a whole day.I used very small dumps file and tried every method
> mentioned at [1] but the problem still couldn't be solved.
>
> [1]https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/165


Yes, this is still an important open issue. There is a version that
does a lot in memory and sometimes RAM is not enough. The other
version dumps a lot on disk, where available storage becomes the
limit. It happens with the English Wikipedia so it will for sure
happen with the wiki-links dataset.

I added one comment from Pablo to the issue. Do you have experience with Pig?
@Chris, did you continue to try to solve this issue? Maybe you can
give Zhiwei some pointers?

Scalability issues pop up frequently when dealing with this amount of
data. Solving them is probably part of the actual GSoC coding period
(while I won't stop you from solving it outside of it as a cool good
open source contributor ;) ).
In order to get familiar with the indexing pipeline and the code
(warm-up phase), it might be best if you build a model from a subset
of the English Wikipedia, limiting the input in the beginning of the
Pig Latin script or by truncating the Wikipedia dump XML.

Cheers,
Max

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] OutOfMemoryError When Running names_and_entities.pig

Reply via email to