On 10 December 2013 13:09, Hady elsahar <[email protected]> wrote: > Hi all , > > Latest changes > > 1- pulled changes from master branch after Merge With the Dump branch > 2- solved merge conflicts ( the remote master branch with local changes of > core refactoring ) > 3- core now builds correctly and tested on sample enwiki and Wikidata dumbs > > Related commits : http://bit.ly/1hK77qH ,
The server module is commented out in pom.xml in this commit... http://bit.ly/1hK7bXx , > http://bit.ly/1hK7d1s , http://bit.ly/1hK7gdI > > thanks > Regards > > > > > > On Tue, Nov 26, 2013 at 5:19 PM, Hady elsahar <[email protected]> wrote: >> >> Hi all , >> >> i guess we have here a working Draft for the refactored core Loaded into >> it the WikiData Extraction process >> [See Commit] >> >> changes are : >> >> 1- added JsonNode class to hold Json Values when the Wikipage of Json >> format is parsed >> 2- added Extractors[JsonNode] , for Extraction of Wikidata Triples >> (WikidataLLExtractor , WikidataLabelsExtractor , ...etc ) >> 3- new Datasets for the new Extractor in DBpediaDatasets.scala >> 4- updated JsonWikiParser to Return JsonNode object contained parsed Json >> >> ps: the Design of the WikidataExtraction process was developed to suit the >> old design of the Core , we don't need that in the moment after the core has >> changed , some of the next steps would be improving the design of the >> WikidataExtraction ( for example the Parser returns generic JValue instead >> of JsonNode class) >> >> ps-2 : i've tested the WikiDataExtractors on sample of the extracted dumb >> at 20130818 - the internal JSON format of Wikidata has changed a little >> since then , hence recent dumps will raise exceptions in the Json parser >> >> >> thanks, >> Regards >> >> >> On Tue, Nov 26, 2013 at 10:55 AM, Dimitris Kontokostas >> <[email protected]> wrote: >>> >>> Hi Hady, >>> >>> >>> On Sun, Nov 24, 2013 at 9:40 PM, Hady elsahar <[email protected]> >>> wrote: >>>> >>>> Hello All , >>>> >>>> considering the issue #38 refactoring the core to accept new formats , >>>> i guess the new core functionality is working now , what's needed is some >>>> modifications as well as your suggestions for updates and of course merging >>>> to the main branch >>>> >>>> what was done so far : >>>> >>>> 1- change Extractor Trait to accept [T] type argument [see commit] >>>> 2- change CompostiteExtractor class to load any type of classes not only >>>> PageNode [see commit] >>>> >>>> 3- Refactoring the core [see commit ] >>>> >>>> added (loadToParsers) method to CompositeExtractor this method will : >>>> >>>> take a list of Extractors and split them by the type they accepts >>>> create JsonParseExtractor object and load it with Extractor[Json format] >>>> create WikiParseExtractor object and load it with Extractor[PageNode] >>>> create CompositeExtractor object and load it with Extractor[WikiPage] >>>> >>>> Created ParseExtractor class which : >>>> >>>> takes WikiPageFormat as an argument and decide suitable parser for it >>>> get loaded with Extractors >>>> in runtime check if page has proper WikiPageFormat if so ,parse it by >>>> the parse and pass it to all inner Extractors >>>> WikiParseExtractor , CompositeExtractor are instances of the same class >>>> ParseExtractor but with different WikiPageFormat Argument >>> >>> good! >>> >>>> Next Steps : >>>> >>>> 1- Loading WikiData Extractors created in the GSoC project to this >>>> branch >>> >>> >>> go ahead >>> >>>> 2- in CompositeExtractor , in order we check for Extractor[T] , T is >>>> erased in runtime so the cleanest way is to use Scala TypeTag which need >>>> scala 2.10 so : >>>> >>>> as a work around i added a Type enumerator at Extractor Class >>>> future work would be installing scala 2.10 , then replacing the enum >>>> with check for TypeTags >>> >>> We talked about this and we both don't like it :) >>> creating super classes for WikiPageExtractor, PageNodeExtractor, >>> JsonExtractor would result in less code but since we'll change it anyway in >>> 2.10 leave it like this and we will fix it after the merge >>> >>>> >>>> 3- Get rid of the RootExtractor >>>> >>>> Questions: >>>> 1- Any suggestions or modifications needed ? >>> >>> >>> I think there are some things that could be improved but we need to see >>> the whole picture first. Let's not waste further time discussing design, go >>> ahead and create a working draft first and we can always improve later >>> >>>> 2- the only difference now than JC's Design is that PraseExtractor >>>> passes WikiPage to all inner Extractor instead of collecting them in one >>>> CompositeExtractor >>>> it doesn't really add any new functionality just following the pattern . >>>> so do you think we should add it ? >>> >>> >>> I think my comment above covers your question :) >>> >>> Good work Hady! >>> >>> Best, >>> Dimitris >>>> >>>> >>>> >>>> thanks >>>> Regards >>>> >>>> ------------------------------------------------- >>>> Hady El-Sahar >>>> Research Assistant >>>> Center of Informatics Sciences | Nile University >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Shape the Mobile Experience: Free Subscription >>>> Software experts and developers: Be at the forefront of tech innovation. >>>> Intel(R) Software Adrenaline delivers strategic insight and >>>> game-changing >>>> conversations that shape the rapidly evolving mobile landscape. Sign up >>>> now. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Dbpedia-developers mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >>>> >>> >>> >>> >>> -- >>> Dimitris Kontokostas >>> Department of Computer Science, University of Leipzig >>> Research Group: http://aksw.org >>> Homepage:http://aksw.org/DimitrisKontokostas >> >> >> >> >> -- >> ------------------------------------------------- >> Hady El-Sahar >> Research Assistant >> Center of Informatics Sciences | Nile University >> >> > > > > -- > ------------------------------------------------- > Hady El-Sahar > Research Assistant > Center of Informatics Sciences | Nile University > > ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
