Hi Dimitris,

It is very encouraging to see such a responsive community :). I am very
excited to get through with as many tasks as I can (with each one I am more
confident about the project.
I have been working on what you suggested and encountered the following
problems.
Once I am through with the configuration etc I will be moving the complete
page http://dbpedia.org/Internationalization/Guide#h152-7 to the new wiki
along with the updated parts visa vi the deprecated ones.


You are right, this part is now deprecated. The configuration changed last
> year and we didn't fix that part in the docs.
> There is also a full page in github explaining the format syntax [1].
>
> As a warm up task you can setup the configurations for you language and
> create a sample dump out of it.
> Then you can update the documentation according to your experience. I'd
> also suggest that you move the whole page to github and link to it from the
> main (gihub) wiki page.
>
>

Trying to follow the directions given in
http://dbpedia.org/Internationalization/Guide#h152-7 by changing the
appropriate directories etc.
I get the following error when I try to accomplish "4. Interlinking" :

*$ shivani ~/extraction-framework/scripts/shell-scripts/interwiki_links-->$
sh interwiki_links.sh 'en' 'hi'
sed: can't read
/home/shivani/extraction-framework/scripts/shell-scripts/interwiki_links/../../../dump/config.properties:
No such file or directory
/en/interlanguage_links_en.nt not found! exiting...
/hi/interlanguage_links_hi.nt not found! exiting...
-------------------------------------------------------------------------------
Generating interlanguage links from en to hi
-------------------------------------------------------------------------------
interwiki_links.sh: line 59: /hi/interlanguage_links_hi.nt.reversed.en: No
such file or directory
grep: /hi/interlanguage_links_hi.nt: No such file or directory
interwiki_links.sh: line 61: /en/interlanguage_links_en.nt.sorted.hi: No
such file or directory
grep: /en/interlanguage_links_en.nt: No such file or directory
interwiki_links.sh: line 64: /en/sameas-hi-en.nt: No such file or directory
wc: /en/sameas-hi-en.nt: No such file or directory

*
*
*
* *I speculate that this should be because of faulty configuration of As
indicated in the old wiki (/dump/extract.properties) ->As should be in the
new wiki (/dump/extraction.iri.same.as.uri.properties)
in the later section of "2. Encoding / resource namespace / titles"

It would be great if you could cite the respective fixes for this.


Thanks a lot !
Shivani



> Cheers,
> Dimitris
>
> [1]
> https://github.com/dbpedia/extraction-framework/wiki/Input-File-Format-In-DBpedia-Extraction-Framework
>
>
> On Tue, Apr 16, 2013 at 4:50 AM, Shivani Poddar <
> [email protected]> wrote:
>
>> Hi,
>> The following page might have a couple of errors which I encountered
>> while setting up the codebase to begin contributing for the "Design a
>> better / interactive display page." project :
>>
>> http://dbpedia.org/Internationalization/Guide#h152-7
>>
>>  The second heading "2. Encoding / resource namespace / titles" directs
>> the user at changing the following :
>>
>> *[extraction_framework/core/src/main/scala]
>> org.dbpedia.extraction.util.Language.scala
>>
>>      // default: no language use generic domain
>>      val generic = Set[String]()
>>
>>      // change to this if language xx should be extracted using the
>> generic domain
>>      val generic = Set("xx")
>>
>> *
>> Here the file name is not *org.dbpedia.extraction.util.Language.scala, *but
>> the file path is
>> "extraction-framework/core/src/main/scala/org/dbpedia/extraction/util/Language.scala"
>>
>> secondly the refereed variables cannot be located in the file.
>> Are they supposed to be created ??
>>
>>
>> Same for the dump/extraction.default.properties file.
>> It is suggested that the value of the format variable be adjusted , while
>> the file already has settings like
>>
>> *105 # NT is unreadable anyway - might as well use URIs for en*
>> *106 format.nt.gz=n-triples;uri-policy.uri*
>> *107 format.nq.gz=n-quads;uri-policy.uri*
>> *108 *
>> *109 # Turtle is much more readable - use nice IRIs for all languages*
>> *110 format.ttl.gz=turtle-triples;uri-policy.iri*
>> *111 format.tql.gz=turtle-quads;uri-policy.iri*
>>
>> It would be helpful if the documentation is more specific. I could tweak
>> the documentation with the respective feedback here.
>>
>> Thank You,
>> Shivani
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>>
>
>
> --
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig
> Research Group: http://aksw.org
> Homepage:http://aksw.org/DimitrisKontokostas
>



-- 
Shivani Poddar,
Bachelors in Computer Sciences and MS in Exact Humanities, Sophomore
International Institute of Information Technology, Hyderabad
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to