Hi Shivani,
You are right, this part is now deprecated. The configuration changed last
year and we didn't fix that part in the docs.
There is also a full page in github explaining the format syntax [1].
As a warm up task you can setup the configurations for you language and
create a sample dump out of it.
Then you can update the documentation according to your experience. I'd
also suggest that you move the whole page to github and link to it from the
main (gihub) wiki page.
Cheers,
Dimitris
[1]
https://github.com/dbpedia/extraction-framework/wiki/Input-File-Format-In-DBpedia-Extraction-Framework
On Tue, Apr 16, 2013 at 4:50 AM, Shivani Poddar
<[email protected]>wrote:
> Hi,
> The following page might have a couple of errors which I encountered while
> setting up the codebase to begin contributing for the "Design a better /
> interactive display page." project :
>
> http://dbpedia.org/Internationalization/Guide#h152-7
>
> The second heading "2. Encoding / resource namespace / titles" directs
> the user at changing the following :
>
> *[extraction_framework/core/src/main/scala]
> org.dbpedia.extraction.util.Language.scala
>
> // default: no language use generic domain
> val generic = Set[String]()
>
> // change to this if language xx should be extracted using the
> generic domain
> val generic = Set("xx")
>
> *
> Here the file name is not *org.dbpedia.extraction.util.Language.scala, *but
> the file path is
> "extraction-framework/core/src/main/scala/org/dbpedia/extraction/util/Language.scala"
>
> secondly the refereed variables cannot be located in the file.
> Are they supposed to be created ??
>
>
> Same for the dump/extraction.default.properties file.
> It is suggested that the value of the format variable be adjusted , while
> the file already has settings like
>
> *105 # NT is unreadable anyway - might as well use URIs for en*
> *106 format.nt.gz=n-triples;uri-policy.uri*
> *107 format.nq.gz=n-quads;uri-policy.uri*
> *108 *
> *109 # Turtle is much more readable - use nice IRIs for all languages*
> *110 format.ttl.gz=turtle-triples;uri-policy.iri*
> *111 format.tql.gz=turtle-quads;uri-policy.iri*
>
> It would be helpful if the documentation is more specific. I could tweak
> the documentation with the respective feedback here.
>
> Thank You,
> Shivani
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc