Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Jona Christopher Sahnwaldt Tue, 10 Dec 2013 17:23:39 -0800

On 10 December 2013 13:09, Hady elsahar <[email protected]> wrote:
> Hi all ,
>
> Latest changes
>
> 1- pulled changes from master branch after Merge With the Dump branch
> 2- solved merge conflicts ( the remote master branch with local changes of
> core refactoring )
> 3- core now builds correctly and tested on sample enwiki and Wikidata dumbs
>
> Related commits :  http://bit.ly/1hK77qH ,


The server module is commented out in pom.xml in this commit...

http://bit.ly/1hK7bXx ,
> http://bit.ly/1hK7d1s , http://bit.ly/1hK7gdI
>
> thanks
> Regards
>
>
>
>
>
> On Tue, Nov 26, 2013 at 5:19 PM, Hady elsahar <[email protected]> wrote:
>>
>> Hi all ,
>>
>> i guess we have here a working Draft for the refactored core Loaded into
>> it the WikiData Extraction process
>> [See Commit]
>>
>> changes are :
>>
>> 1- added JsonNode class to hold  Json Values when the Wikipage of Json
>> format is parsed
>> 2- added Extractors[JsonNode] , for Extraction of Wikidata Triples
>> (WikidataLLExtractor , WikidataLabelsExtractor , ...etc )
>> 3- new Datasets for the new Extractor in DBpediaDatasets.scala
>> 4- updated JsonWikiParser to Return JsonNode object contained parsed Json
>>
>> ps: the Design of the WikidataExtraction process was developed to suit the
>> old design of the Core , we don't need that in the moment after the core has
>> changed , some of the next steps would be improving the design of the
>> WikidataExtraction  ( for example the Parser returns generic JValue instead
>> of JsonNode class)
>>
>> ps-2 : i've tested the WikiDataExtractors on sample of the extracted dumb
>> at 20130818   - the internal JSON format of Wikidata has changed a little
>> since then , hence recent dumps will raise exceptions in the Json parser
>>
>>
>> thanks,
>> Regards
>>
>>
>> On Tue, Nov 26, 2013 at 10:55 AM, Dimitris Kontokostas
>> <[email protected]> wrote:
>>>
>>> Hi Hady,
>>>
>>>
>>> On Sun, Nov 24, 2013 at 9:40 PM, Hady elsahar <[email protected]>
>>> wrote:
>>>>
>>>> Hello All ,
>>>>
>>>> considering the issue #38  refactoring the core to accept new formats ,
>>>> i guess the new core functionality is working now , what's needed is some
>>>> modifications as well as your suggestions for updates and of course merging
>>>> to the main branch
>>>>
>>>> what was done so far :
>>>>
>>>> 1- change Extractor Trait to accept [T] type argument [see commit]
>>>> 2- change CompostiteExtractor class to load any type of classes not only
>>>> PageNode [see commit]
>>>>
>>>> 3- Refactoring the core [see commit ]
>>>>
>>>> added  (loadToParsers) method to CompositeExtractor this method will :
>>>>
>>>> take a list of Extractors and split them by the type they accepts
>>>> create JsonParseExtractor object and load it with Extractor[Json format]
>>>> create WikiParseExtractor  object and load it with Extractor[PageNode]
>>>> create CompositeExtractor object and load it with Extractor[WikiPage]
>>>>
>>>> Created ParseExtractor class which :
>>>>
>>>> takes WikiPageFormat  as an argument and decide suitable parser for it
>>>> get loaded with Extractors
>>>> in runtime check if page has proper WikiPageFormat if so ,parse it by
>>>> the parse and pass it to all inner Extractors
>>>> WikiParseExtractor , CompositeExtractor are instances of the same class
>>>> ParseExtractor  but with different WikiPageFormat Argument
>>>
>>> good!
>>>
>>>> Next Steps :
>>>>
>>>> 1- Loading WikiData Extractors created in the GSoC project to this
>>>> branch
>>>
>>>
>>> go ahead
>>>
>>>> 2- in CompositeExtractor , in order we check for  Extractor[T] , T is
>>>> erased in runtime so the cleanest way is to use Scala TypeTag which need
>>>> scala 2.10 so :
>>>>
>>>> as a work around i added a Type enumerator at Extractor Class
>>>> future work would be installing scala 2.10 , then replacing the enum
>>>> with check for TypeTags
>>>
>>> We talked about this and we both don't like it :)
>>> creating super classes for WikiPageExtractor, PageNodeExtractor,
>>> JsonExtractor would result in less code but since we'll change it anyway in
>>> 2.10 leave it like this and we will fix it after the merge
>>>
>>>>
>>>> 3- Get rid of the RootExtractor
>>>>
>>>> Questions:
>>>> 1- Any suggestions or modifications needed ?
>>>
>>>
>>> I think there are some things that could be improved but we need to see
>>> the whole picture first. Let's not waste further time discussing design, go
>>> ahead and create a working draft first and we can always improve later
>>>
>>>> 2- the only difference now than  JC's Design is that PraseExtractor
>>>> passes WikiPage to all inner Extractor instead of collecting them in one
>>>> CompositeExtractor
>>>> it doesn't really add any new functionality just following the pattern .
>>>> so do you think we should add it ?
>>>
>>>
>>> I think my comment above covers your question :)
>>>
>>> Good work Hady!
>>>
>>> Best,
>>> Dimitris
>>>>
>>>>
>>>>
>>>> thanks
>>>> Regards
>>>>
>>>> -------------------------------------------------
>>>> Hady El-Sahar
>>>> Research Assistant
>>>> Center of Informatics Sciences | Nile University
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Shape the Mobile Experience: Free Subscription
>>>> Software experts and developers: Be at the forefront of tech innovation.
>>>> Intel(R) Software Adrenaline delivers strategic insight and
>>>> game-changing
>>>> conversations that shape the rapidly evolving mobile landscape. Sign up
>>>> now.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Dbpedia-developers mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>>
>>>
>>>
>>>
>>> --
>>> Dimitris Kontokostas
>>> Department of Computer Science, University of Leipzig
>>> Research Group: http://aksw.org
>>> Homepage:http://aksw.org/DimitrisKontokostas
>>
>>
>>
>>
>> --
>> -------------------------------------------------
>> Hady El-Sahar
>> Research Assistant
>> Center of Informatics Sciences | Nile University
>>
>>
>
>
>
> --
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile University
>
>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Reply via email to