Re: [Dbpedia-developers] Refactoring the extraction Framework core to accept new formats

Dimitris Kontokostas Sun, 17 Nov 2013 22:11:06 -0800

On Mon, Nov 18, 2013 at 7:04 AM, Hady elsahar <[email protected]> wrote:


> *Some Questions : *
>
> 1-
> in the Trait Extractor , that all our extractors implements , if we
> changed it from :
>
> trait Extractor extends Mapping[PageNode]
>
> to
>
> trait Extractor [T] extends Mapping[T]
>
>
> we will need to refactor all Extractor classes to add which type of data
> they accept
> do you is this ok ?
>

I think this is a good choice, it will also help catch errors at compile
time and 'T' will never change for an Extractror


> i tried a for some time to tweak it using Scala upper and lower type bound
> to make PageNode the Default type when the type is not set , but i didn't
> manage to do it . ( but we wouldn't need of course that if we added the
> type to all existing constructors)
>

I think lower/upper bounds work for sub-super types only. Maybe there is a
scala tweak here that I am not aware of but if no one objects, let's keep
it simple.

Best,
Dimitris




>
>
>
>
> On Tue, Nov 12, 2013 at 9:32 AM, Dimitris Kontokostas <
> [email protected]> wrote:
>
>>
>>
>>
>> On Sun, Nov 10, 2013 at 7:23 PM, Hady elsahar <[email protected]>wrote:
>>
>>> Hello All,
>>>
>>> in order to merge the 
>>> code<https://github.com/hadyelsahar/extraction-framework/commits/parseJson>written
>>>  for the GSoC project for wikidata Extraction process , we need
>>> first to work on issue #38 - Refactoring the core to accept new 
>>> formats<https://github.com/dbpedia/extraction-framework/issues/38>
>>> by referring to JC's suggestion  here 
>>> <https://github.com/dbpedia/extraction-framework/pull/35#issuecomment-16187074>
>>>  and
>>> diagram 
>>> here<https://github.com/dbpedia/extraction-framework/pull/35#issuecomment-16187074>
>>>
>>>
>>> below some points that we may face :
>>>
>>>    - in the Design JC suggested , CompositeExtractor sometimes accepts
>>>    JValue or Wikipage or PageNode . we have two alternatives to implement 
>>> this:
>>>    - handling this automatically by checking what type does each of the
>>>       inner Extractors Accepts , call the parser for it and pass suitable 
>>> data to
>>>       the inner extractor
>>>       - handling this by hardcoding ie. makiing
>>>       JValueCompositeExtractor . PageNodeCompositeExtractor ..etc , either 
>>> by
>>>       Templating or creating subclasses
>>>
>>> I think that the first is the goal but I wouldn't mind if you started
>> with the second approach if it makes it easier for you. Once we have it
>> running we can refactor later
>>
>>>
>>>
>>>
>>>    - also in the old Design it was like this :
>>>       - once we create new Extractor to run it we add it to the config
>>>       File
>>>       - ConfigLoader loads it inside the CompositeExtractor
>>>       - WikiParserWrapper
>>>       
>>> <https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/WikiParserWrapper.scala>Decides
>>>       which parser would be activated
>>>
>>> we have to tweak this a little bit in our new design to allow the new
>>> Level of CompositeExtractors to choose which Extractors to load and which
>>> not
>>>
>>>
>>> so why wouldn't our design be :
>>>
>>>
>>> - allow CompositeExtractors to accept and pass only WikiPage objects to
>>> it's inner extractors
>>>
>>>
>> This is what it does in JC's design, there is just an extra level on
>> CompositeExtractors (more later)
>>
>>
>>>  - Devise way to Map Extractors and ParserExtractors to PageType (an
>>> Enum in the Extractor Class and define it in the Subclasses )
>>>
>>> - ConfigLoader :
>>>
>>> - loads all Extractors from config file
>>> - creates two ParseExtractors (JSONParseExtractor , WikiParseExtractor)
>>> - check type of Each needed extractor if it's JSON , Load it to the
>>> JsonParseExtractor , if it's WikiText Load it to the WikiParseExtractor
>>>
>>>
>> This can be done by the first CompositeExtractor.
>> gather all Extractor[WikiPage], encapsulate them in a compositeExtractor
>> & extract them
>> gather all Extractor[PageNode] and encapsulate them in a
>> compositeExtractor and pass them to WikitextParseExtractor.
>>            WikitextParseExtractor: If page type is WikiText, parse it and
>> pass a PageNode to all enabled extractors[PageNode]. Otherwise return an
>> empty Quad list
>> Similar for the JsonValueParseExtractor
>>
>> This way you don't have to change anything in the configuration loading,
>> just move the parsing step further down
>>
>>
>>>  - load JsonParserExtractor , WikiParseExtractor , other extractors to a
>>> CompositeExtractor
>>>
>>> - CompositeExtractor :
>>>
>>> - send Wikipage to all inner Extractor objects (JsonParseExtractor ,
>>> WikiParseExtractor , other normal Extractors)
>>>
>>> - JsonParseExtractor :
>>>
>>> - If page format is JSON, run WikiPage object through JSON parser and
>>> pass JValue to all inner Extractors
>>>
>>> - Otherwise, do nothing
>>>
>>> - WikitextParsingExtractor:
>>>
>>> - If page format is wikitext, run WikiPage object through WikiParser and
>>> pass PageNode to all inner Extractors
>>>
>>> - Otherwise, do nothing
>>>
>>>
>>> - WikiparserWrapper functionality will be obsolete because as JC
>>> suggested to each parser will check page format if it's of the same type
>>> parses it ,if not do nothing so we remove it
>>>
>>>
>>> Pros would be :
>>>
>>>    - simpler Design , less number of classes , less changes as well
>>>    - skip Extra level of composite extractors that doesn't add any
>>>    functionality
>>>    - overcome the part of different inputs and outputs for
>>>    CompositeExtractor
>>>    - same configFiles would work
>>>
>>> Cons would be :
>>>
>>>    - maybe it's confusing that ParseExtractor contains as well inner
>>>    Extractors
>>>    - more functionality in the ConfigLoader
>>>    - we should specify for each of the Extractors what kind of pages it
>>>    needs to Receive
>>>
>>>
>>>
>> Maybe I misunderstood but the only change I can see in you diagram with
>> JC's is a level of CompositeExtractor.
>> imo this is just a design pattern that helps encapsulate multiple
>> extractors for a parser and if this is your only concern we can skip this
>> for now.
>>
>> Cheers,
>> Dimitris
>>
>>
>>>  thanks
>>> Regards
>>>
>>> -------------------------------------------------
>>> Hady El-Sahar
>>> Research Assistant
>>> Center of Informatics Sciences | Nile 
>>> University<http://nileuniversity.edu.eg/>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>
>>
>> --
>> Dimitris Kontokostas
>> Department of Computer Science, University of Leipzig
>> Research Group: http://aksw.org
>> Homepage:http://aksw.org/DimitrisKontokostas
>>
>
>
>
> --
> -------------------------------------------------
> Hady El-Sahar
> Research Assistant
> Center of Informatics Sciences | Nile 
> University<http://nileuniversity.edu.eg/>
>
>
>
>
> ------------------------------------------------------------------------------
> DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
> Free app hosting. Or install the open source package on any LAMP server.
> Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Refactoring the extraction Framework core to accept new formats

Reply via email to