Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Jona Christopher Sahnwaldt Tue, 10 Dec 2013 18:58:30 -0800

I just saw the discussion from mid November ("Refactoring the
extraction Framework core to accept new formats"). I didn't follow the
mails back then. Sorry, I should have written earlier.


Most of what I said in my last mail could have been derived from this
comment in Extractor.scala:

Necessary to get some type safety in CompositeExtractor:
Class[_ <: Extractor] can be checked at runtime, but Class[_ <:
Mapping[PageNode]] can not.

and this section in https://github.com/dbpedia/extraction-framework/pull/35 :

Currently, the names of the classes and traits in the mappings package
are a bit chaotic: the root of the type hierarchy is called Mapping.
We should rename it to Extractor. The specific trait for extractors
that handle PageNode objects is called Extractor. We should rename it
to PageNodeExtractor or WikitextExtractor.

On 11 December 2013 03:36, Jona Christopher Sahnwaldt <[email protected]> wrote:
> A few thoughts:
>
> * If we used a proper configuration framework like Spring, we wouldn't
> need the ugly class loading stuff in Config.scala and
> CompositeExtractor.scala. That's a general problem of DBpedia, not
> specific to this refactoring.
>
> * Without a separate subclass of Mapping.scala for each type of parse
> result, we lose runtime type safety. I think we should introduce these
> classes, since ach of them would consist of one line and a few
> imports: "trait PageNodeExtractor extends Mapping[PageNode]" is the
> whole code for one class, "trait JsonNodeExtractor extends
> Mapping[JsonNode]" is the next, etc. The class names I use here should
> be improved.
>
> * If these separate subclasses existed, you wouldn't need the type
> tags etc. because type erasure is not a problem anymore. When
> CompositeExtractor gets a list of classes, it could check each class
> if it is a subclass of PageNodeExtractor or JsonNodeExtractor and put
> the instantiated objects into separate lists.
>
> See below for a bit more.
>
> On 24 November 2013 20:40, Hady elsahar <[email protected]> wrote:
>> Hello All ,
>>
>> considering the issue #38  refactoring the core to accept new formats , i
>> guess the new core functionality is working now , what's needed is some
>> modifications as well as your suggestions for updates and of course merging
>> to the main branch
>>
>> what was done so far :
>>
>> 1- change Extractor Trait to accept [T] type argument [see commit]
>> 2- change CompostiteExtractor class to load any type of classes not only
>> PageNode [see commit]
>>
>> 3- Refactoring the core [see commit ]
>>
>> added  (loadToParsers) method to CompositeExtractor this method will :
>>
>> take a list of Extractors and split them by the type they accepts
>> create JsonParseExtractor object and load it with Extractor[Json format]
>> create WikiParseExtractor  object and load it with Extractor[PageNode]
>> create CompositeExtractor object and load it with Extractor[WikiPage]
>>
>> Created ParseExtractor class which :
>>
>> takes WikiPageFormat  as an argument and decide suitable parser for it
>> get loaded with Extractors
>> in runtime check if page has proper WikiPageFormat if so ,parse it by the
>> parse and pass it to all inner Extractors
>> WikiParseExtractor , CompositeExtractor are instances of the same class
>> ParseExtractor  but with different WikiPageFormat Argument
>>
>>
>> Next Steps :
>>
>> 1- Loading WikiData Extractors created in the GSoC project to this branch
>> 2- in CompositeExtractor , in order we check for  Extractor[T] , T is erased
>> in runtime so the cleanest way is to use Scala TypeTag which need scala 2.10
>> so :
>>
>> as a work around i added a Type enumerator at Extractor Class
>> future work would be installing scala 2.10 , then replacing the enum with
>> check for TypeTags
>>
>> 3- Get rid of the RootExtractor
>>
>> Questions:
>> 1- Any suggestions or modifications needed ?
>> 2- the only difference now than  JC's Design is that PraseExtractor passes
>> WikiPage to all inner Extractor instead of collecting them in one
>> CompositeExtractor
>> it doesn't really add any new functionality just following the pattern . so
>
> Sorry, but such a statement makes me a bit angry. Yes, it doesn't add
> functionality, it "just follows a pattern", but what does that mean?
> That's a bit like saying, object-oriented programming doesn't really
> add new functionality over procedural programming. Or C didn't really
> add functionality, in the 1960s programmers got the job done with
> machine code. That's all true, but code that is understandable and has
> a clean structure is just as important (and often more important) than
> functionality. "Separation of concerns" is one important keyword here.
> In case you don't own the Design Patterns book, you should definitely
> get it.
>
> But as Dimitris said, we can change this later. I just really think we should.
>
> JC
>
>> do you think we should add it ?
>>
>>
>> thanks
>> Regards
>>
>> -------------------------------------------------
>> Hady El-Sahar
>> Research Assistant
>> Center of Informatics Sciences | Nile University
>>
>>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Reply via email to