Hey Wang,

the scripts+files you looked at are fairly new, some as new as last
weekend. So it's not that those files do not follow the tutorial or paths
but that the documentation is not yet up to date.
You can have a look at [1], which is what we used to index 10+ languages
recently. I will push some final changes to this tonight. One good warm up
task you could do is the following: in index_db.sh there are the lines of
sed s/../ which is a very hackish way of replacing the paramters in the
.parameters file. You can read [2] and replace the sed commands in the
script with actual parameters passed to pig, e.g.:

pig -param input=input.log -param out=outputDir script.pig

Please fork the relevant repository and send a pull request so we can
review and discuss changes, see [3].

Best,
Jo

[1] https://github.com/jodaiber/model-quickstarter
[2] http://wiki.apache.org/pig/ParameterSubstitution
[3] https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Contributing


On Thu, Apr 18, 2013 at 7:03 PM, Wang Wei <[email protected]> wrote:

> Hi Pablo,
>
> I have updated the DB backed core page[1]. Currently, it works if you
> follow exactly the steps.
>
> The problems are mainly from the paths. I added notes under commands that
> are path sensitive.
> Generally, I think these [2][3][4][5] files need updates to make the
> program more robust. But currently I have no rights to change them. Thus,
> to run the index program, you have to follow the steps exactly and take
> care of all the notes.
>
> in [6], the maven repo for spotlight core 0.6 is problematic. So users
> have to manually install that jar which pignlpoc depends on.
>
>
> By solving these problems, I get familiar with the project, the code,
> maven, scala... I would like to  improve this page and contribute more for
> the project. Thanks.
>
> [1]
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
> [2]
> https://github.com/dbpedia-spotlight/pignlproc/blob/master/src/main/java/pignlproc/helpers/RestrictedNGramGenerator.java(line
> 87)
> [3]
> https://github.com/dbpedia-spotlight/pignlproc/blob/master/examples/indexing/token_counts.pig.params(path
> problem)
> [4]
> https://github.com/dbpedia-spotlight/pignlproc/blob/master/utilities/split_train_test.py(line
> 35)
> [5]
> https://raw.github.com/jodaiber/dbpedia-spotlight/master/bin/index_db.sh
>
> [6]https://github.com/dbpedia-spotlight/pignlproc/blob/master/pom.xml
>
>
> Best Regards,
> Wang Wei
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to