Re: [Dbpedia-gsoc] Dbpedia-gsoc Digest, Vol 17, Issue 7

Dimitris Kontokostas Fri, 11 Mar 2016 00:08:43 -0800

looks like there is a mismatch between your download/extraction config
The default download configuration downloads all languages and creates this
csv file
the default extraction configuration trues to extract all languages and
tries to find this file


try to change the languages in your extraction config and limit them to the
one(s) your downloaded only

On Fri, Mar 11, 2016 at 2:49 AM, Dilshan Pathirana <
[email protected]> wrote:

>
> HI
> I got this error when running the extractor
>
> Exception in thread "main" java.io.FileNotFoundException:
> /media/dilshan/acdbf529-facc-431e-9355-7b7bc7cc4ab9/dilshan/data/wikipedias.csv
> (No such file or directory)
>
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at scala.io.Source$.fromFile(Source.scala:91)
>     at scala.io.Source$.fromFile(Source.scala:76)
>     at org.dbpedia.extraction.util.WikiInfo$.fromFile(WikiInfo.scala:26)
>     at
> org.dbpedia.extraction.util.ConfigUtils$.parseLanguages(ConfigUtils.scala:83)
>     at
> org.dbpedia.extraction.dump.extract.Config.loadExtractorClasses(Config.scala:77)
>     at org.dbpedia.extraction.dump.extract.Config.<init>(Config.scala:58)
>     at
> org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:27)
>     at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala
> I first downloaded the half of the data dump using the downloader but when
> I m ruuing i GET This error please help me by telling how to run the
> extractor
> On Wednesday, 9 March 2016, Dilshan Pathirana <[email protected]>
> wrote:
>
>> Hi thank you Piano and Dimitris can you please help me by giving propper
>> instruction by how to run the project from source code I have set the
>> project correctly but I have no idea how to run the project
>>
>> On 9 March 2016 at 19:39, Dimitris Kontokostas <[email protected]> wrote:
>>
>>> in the server.default.properties configuration you can either
>>>
>>> 1. limit the languages to one only e.g. en instead of @mappings
>>>
>>> # List of languages, e.g. 'en,de,fr' or '@mappings'
>>>
>>> languages=@mappings
>>>
>>>
>>> 2. change the mappingsDir to something that does not exists to force
>>> downloading from the server
>>>
>>> # ontology file. if it doesn't exist, load from server
>>> ontologyFile=../doesnotexists/ontology.xml
>>>
>>> # mappings dir. if it doesn't exist, load from server
>>> mappingsDir=../doesnotexists/mappings
>>>
>>>
>>>
>>>
>>> On Wed, Mar 9, 2016 at 2:32 PM, Dilshan Pathirana <
>>> [email protected]> wrote:
>>>
>>>> Hi Get the following error when running the project in interlij idea
>>>>
>>>> Exception in thread "main" java.io.FileNotFoundException:
>>>> ../mappings/Mapping_ceb.xml (No such file or directory)
>>>>     at java.io.FileInputStream.open0(Native Method)
>>>>     at java.io.FileInputStream.open(FileInputStream.java:195)
>>>>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>     at
>>>> org.dbpedia.extraction.sources.XMLSource$$anonfun$fromFile$1.apply(XMLSource.scala:32)
>>>>     at
>>>> org.dbpedia.extraction.sources.XMLSource$$anonfun$fromFile$1.apply(XMLSource.scala:32)
>>>>     at
>>>> org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:111)
>>>>     at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>>>>     at
>>>> org.dbpedia.extraction.sources.XMLReaderSource.map(XMLSource.scala:108)
>>>>     at
>>>> org.dbpedia.extraction.server.ExtractionManager.loadMappingPages(ExtractionManager.scala:166)
>>>>     at
>>>> org.dbpedia.extraction.server.ExtractionManager$$anonfun$loadMappingPages$1.apply(ExtractionManager.scala:147)
>>>>     at
>>>> org.dbpedia.extraction.server.ExtractionManager$$anonfun$loadMappingPages$1.apply(ExtractionManager.scala:147)
>>>>     at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>>>>     at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>>>>     at
>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>     at
>>>> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>>>>     at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>>>>     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>>>>     at
>>>> org.dbpedia.extraction.server.ExtractionManager.loadMappingPages(ExtractionManager.scala:147)
>>>>     at
>>>> org.dbpedia.extraction.server.DynamicExtractionManager.<init>(DynamicExtractionManager.scala:40)
>>>>     at org.dbpedia.extraction.server.Server.<init>(Server.scala:32)
>>>>     at org.dbpedia.extraction.server.Server$.main(Server.scala:74)
>>>>     at org.dbpedia.extraction.server.Server.main(Server.scala)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>>>     at
>>>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>>>>
>>>> On 9 March 2016 at 15:44, <[email protected]>
>>>> wrote:
>>>>
>>>>> Send Dbpedia-gsoc mailing list submissions to
>>>>>         [email protected]
>>>>>
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>         https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> or, via email, send a message with subject or body 'help' to
>>>>>         [email protected]
>>>>>
>>>>> You can reach the person managing the list at
>>>>>         [email protected]
>>>>>
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of Dbpedia-gsoc digest..."
>>>>>
>>>>>
>>>>> Today's Topics:
>>>>>
>>>>>    1. Introducing myself. (ved mathai)
>>>>>    2. Re: Introduction and Interest in Table Extractor  Project
>>>>>       (Dimitris Kontokostas)
>>>>>    3. Re: GSoC + Setting up extraction-framework (Dimitris Kontokostas)
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------
>>>>>
>>>>> Message: 1
>>>>> Date: Wed, 9 Mar 2016 11:25:33 +0530
>>>>> From: ved mathai <[email protected]>
>>>>> Subject: [Dbpedia-gsoc] Introducing myself.
>>>>> To: [email protected]
>>>>> Message-ID:
>>>>>         <
>>>>> cak-vgpvchxgeoskusi-kj-dyoqe6o2dsymstw4kasi30q0c...@mail.gmail.com>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> Hi,
>>>>> I am just taking this opportunity to introduce myself to the group. My
>>>>> name
>>>>> is Ved Mathai, and I am from Bangalore, India. I am a Masters student
>>>>> in
>>>>> Computer Science at the International Institute of Information
>>>>> Technology,
>>>>> Bangalore (IIIT-B). My research interests include web ontology and
>>>>> semantics, nlp, information retrieval. I am learning Machine Learning,
>>>>> Game
>>>>> Theory and Agent based modeling as part of course work.
>>>>> But for the last 8 months, I have been working on a project, which is
>>>>> actually another student's PhD project which uses dbpedia very
>>>>> closely. It
>>>>> attempts to take a simple csv file from let's say data.gov, about some
>>>>> topic say commodity prices or traffic details. Many of these tables
>>>>> aren't
>>>>> topically mapped to an ontology. So by using type information from
>>>>> DBpedia
>>>>> (and skos-classes) we find the most common types (non trivial) for each
>>>>> data value and store their frequency of occurrence. Then we map
>>>>> properties
>>>>> ?p (?s ?p ?o) where ?s is an item from one row one column and ?o is the
>>>>> data from the same row another column and we map the frequency of this
>>>>> property occurring between columns. And from this not only will we
>>>>> know the
>>>>> topic (theme) but also how the columns relate to each other (scheme) so
>>>>> that we can now suggest tuples back to a dataset (say dbpedia itself).
>>>>> We are awaiting acceptance for our paper in VLDB this year for this.
>>>>> The
>>>>> code for this however is available on github (not recent version). In
>>>>> this
>>>>> project, we faced a tables where a lot of columns are date values which
>>>>> exist on Wikipedia but not in DBpedia or not in xsd:format. So it
>>>>> seemed
>>>>> plausible that this whole domain of date time
>>>>> <
>>>>> http://wiki.dbpedia.org/ideas/idea/156/parsing-time-information-as-xsddates-from-wikipedia-plain-text/
>>>>> >
>>>>> can take some working on from the DBpedia extractor so that other
>>>>> projects
>>>>> built upon this benefit. And I have got some time on my hands, so I
>>>>> thought
>>>>> might as well make a proper contribution through Gsoc to the main code,
>>>>> rather than make ad hoc versions here for my research.
>>>>> Thanks,
>>>>> Ved
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Message: 2
>>>>> Date: Wed, 9 Mar 2016 11:15:45 +0200
>>>>> From: Dimitris Kontokostas <[email protected]>
>>>>> Subject: Re: [Dbpedia-gsoc] Introduction and Interest in Table
>>>>>         Extractor       Project
>>>>> To: Sandro Coelho <[email protected]>
>>>>> Cc: Andrey Pechenezhskiy <[email protected]>,    "DBpedia GSoC
>>>>>         \(ML\)" <[email protected]>
>>>>> Message-ID:
>>>>>         <CA+u4+a0PJYN5rzq7gyfXw=QeCqvs=R5xBXztfV=R=
>>>>> [email protected]>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> Hi Andrey & welcome,
>>>>>
>>>>> I sent you a slack invitation, regarding your project questions, I
>>>>> think it
>>>>> is best if you ask them on the ideas page directly
>>>>>
>>>>> Cheers,
>>>>> Dimitris
>>>>>
>>>>> On Wed, Mar 9, 2016 at 12:29 AM, Sandro Coelho <
>>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>> > Welcome to DBpedia Andrey!
>>>>> >
>>>>> > Nice to know that you are working on warm-up tasks. Keep going  and
>>>>> please use
>>>>> > the ideas page to discuss details for each project.
>>>>> >
>>>>> > About Slack, @Dimitris can provide you more details.
>>>>> >
>>>>> > All the best,
>>>>> >
>>>>> >
>>>>> > 2016-03-08 9:54 GMT-03:00 Andrey Pechenezhskiy <
>>>>> [email protected]>:
>>>>> >
>>>>> >> Hello!
>>>>> >>
>>>>> >> My name is Andrey Pechenezhskiy, I have been studying at the Perm
>>>>> State
>>>>> >> University, Russia for six years. My research interests lie in the
>>>>> fields
>>>>> >> of NLP and web-mining for Competitive Intelligence tasks.
>>>>> >>
>>>>> >> I am interested in the table extractor project
>>>>> >> <http://wiki.dbpedia.org/ideas/idea/59/the-table-extractor/> that
>>>>> aims
>>>>> >> to extract data hidden in tables because I have experience in the
>>>>> web
>>>>> >> content mining and Scala. I have studied the code of the soccer
>>>>> extractor
>>>>> >> <https://bitbucket.org/tsiteam/soccer-extractor> which parses a
>>>>> >> Wikipedia template ?CarrieraSportivo? and composes an RDF graph. I
>>>>> decided
>>>>> >> to continue working with the football domain on the first step
>>>>> because this
>>>>> >> domain contains many different tables. I have been researching the
>>>>> >> Wikipedia templates that formats tables. Then I have worked with the
>>>>> >> extraction-framework and found some infobox mappings for the table
>>>>> >> templates that could be useful.
>>>>> >>
>>>>> >> I think the project will be based on the systematization of
>>>>> hypothesis
>>>>> >> testing results. So, have the result of the project to be a new
>>>>> table
>>>>> >> extractor in extractor-framework and should intermediate work of the
>>>>> >> project be Scala scripts without extraction-framework?
>>>>> >>
>>>>> >> I will continue to explore extractor-framework, Wikipedia tables,
>>>>> and
>>>>> >> articles about the table extraction, then I will write the draft of
>>>>> the
>>>>> >> proposal and will implement extractor for some table templates in
>>>>> Scala. I
>>>>> >> will appreciate if you invited me in the DBpedia #gsoc slack
>>>>> channel or
>>>>> >> give me some suggestions.
>>>>> >>
>>>>> >> Thanks in advance!
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> ------------------------------------------------------------------------------
>>>>> >> Transform Data into Opportunity.
>>>>> >> Accelerate data analysis in your applications with
>>>>> >> Intel Data Analytics Acceleration Library.
>>>>> >> Click to learn more.
>>>>> >> http://makebettercode.com/inteldaal-eval
>>>>> >> _______________________________________________
>>>>> >> Dbpedia-gsoc mailing list
>>>>> >> [email protected]
>>>>> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Sandro
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > Transform Data into Opportunity.
>>>>> > Accelerate data analysis in your applications with
>>>>> > Intel Data Analytics Acceleration Library.
>>>>> > Click to learn more.
>>>>> > http://makebettercode.com/inteldaal-eval
>>>>> > _______________________________________________
>>>>> > Dbpedia-gsoc mailing list
>>>>> > [email protected]
>>>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> Kontokostas Dimitris
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Message: 3
>>>>> Date: Wed, 9 Mar 2016 12:14:05 +0200
>>>>> From: Dimitris Kontokostas <[email protected]>
>>>>> Subject: Re: [Dbpedia-gsoc] GSoC + Setting up extraction-framework
>>>>> To: Nico Del Piano <[email protected]>
>>>>> Cc: "DBpedia GSoC \(ML\)" <[email protected]>
>>>>> Message-ID:
>>>>>         <CA+u4+a3+QVM4j51ps=gN=
>>>>> [email protected]>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> Welcome to DBpedia Nicolas!
>>>>>
>>>>> if you run it through intellij  you probably need to set the working
>>>>> directory to the module or maven directory.
>>>>> If this fixes the problem can you also update the documentation
>>>>> accordingly?
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> Cheers,
>>>>> Dimitris
>>>>>
>>>>> On Wed, Mar 9, 2016 at 12:00 AM, Nico Del Piano <[email protected]>
>>>>> wrote:
>>>>>
>>>>> > Hi guys!
>>>>> >
>>>>> > I'm a Computer Science student from Rosario, Argentina. Last year
>>>>> I've
>>>>> > participated in the GSoC for Haskell.org and now I definitely want to
>>>>> > participate again. I find DBPedia a very interesting project and a
>>>>> > great community to be part of, and I'm looking for a cool project to
>>>>> > work on inside this organization. I'm keen on functional-oriented
>>>>> > languages (such as Scala, Haskell, and so on), and I came up with the
>>>>> > following idea which I'm very interested in:
>>>>> > http://wiki.dbpedia.org/ideas/idea/43/upgrade-fix-sweble-parser/
>>>>> >
>>>>> > To begin with, I'd like to start with the warm-up tasks, and I could
>>>>> > properly set the environment to work with the extraction-framework. I
>>>>> > followed this guide:
>>>>> >
>>>>> >
>>>>> https://github.com/dbpedia/extraction-framework/wiki/Setting-up-intellij-idea
>>>>> > .
>>>>> > However, when I run the project, I get the following error:
>>>>> >
>>>>> > Exception in thread "main" java.io.FileNotFoundException:
>>>>> > ../mappings/Mapping_ceb.xml (No such file or directory)
>>>>> >
>>>>> > Could you provide me with some help with this issue? Do I need to do
>>>>> > something else?
>>>>> >
>>>>> > Thanks in advance!
>>>>> >
>>>>> > Nicolas.
>>>>> >
>>>>> >
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > Transform Data into Opportunity.
>>>>> > Accelerate data analysis in your applications with
>>>>> > Intel Data Analytics Acceleration Library.
>>>>> > Click to learn more.
>>>>> > http://makebettercode.com/inteldaal-eval
>>>>> > _______________________________________________
>>>>> > Dbpedia-gsoc mailing list
>>>>> > [email protected]
>>>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kontokostas Dimitris
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>> -------------- next part --------------
>>>>> A non-text attachment was scrubbed...
>>>>> Name: image.png
>>>>> Type: image/png
>>>>> Size: 26326 bytes
>>>>> Desc: not available
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Transform Data into Opportunity.
>>>>> Accelerate data analysis in your applications with
>>>>> Intel Data Analytics Acceleration Library.
>>>>> Click to learn more.
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Dbpedia-gsoc mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>>
>>>>>
>>>>> End of Dbpedia-gsoc Digest, Vol 17, Issue 7
>>>>> *******************************************
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>>> _______________________________________________
>>>> Dbpedia-gsoc mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>
>>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>
>>


-- 
Kontokostas Dimitris

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] Dbpedia-gsoc Digest, Vol 17, Issue 7

Reply via email to