Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Dimitris Kontokostas Wed, 11 Dec 2013 00:02:29 -0800

Hi JC,

On Wed, Dec 11, 2013 at 4:56 AM, Jona Christopher Sahnwaldt <[email protected]
> wrote:


> I just saw the discussion from mid November ("Refactoring the
> extraction Framework core to accept new formats"). I didn't follow the
> mails back then. Sorry, I should have written earlier.
>
> Most of what I said in my last mail could have been derived from this
> comment in Extractor.scala:
>
> Necessary to get some type safety in CompositeExtractor:
> Class[_ <: Extractor] can be checked at runtime, but Class[_ <:
> Mapping[PageNode]] can not.
>
> and this section in
> https://github.com/dbpedia/extraction-framework/pull/35 :
>
> Currently, the names of the classes and traits in the mappings package
> are a bit chaotic: the root of the type hierarchy is called Mapping.
> We should rename it to Extractor. The specific trait for extractors
> that handle PageNode objects is called Extractor. We should rename it
> to PageNodeExtractor or WikitextExtractor.
>

I was planning a major renaming/rearrangement of the mappings package but
didn't want Hady to do it.
However this change will make it a lot easier for Hady.
@Hady, is it easier to do this directly in your code or do you want me to
do it on the main repo this and then merge? Either is fine by me

Regarding the original design I was also planing to follow it after HAdy's
commit too.
Sometimes discussing by mail might take more time than implementing so I
wanted to get a good draft form Hady and take it from there.
This was part of Hady's project of course but pushing too hard on core
refactoring would shift his focus.

Cheers,
Dimitris

>
> On 11 December 2013 03:36, Jona Christopher Sahnwaldt <[email protected]>
> wrote:
> > A few thoughts:
> >
> > * If we used a proper configuration framework like Spring, we wouldn't
> > need the ugly class loading stuff in Config.scala and
> > CompositeExtractor.scala. That's a general problem of DBpedia, not
> > specific to this refactoring.
> >
> > * Without a separate subclass of Mapping.scala for each type of parse
> > result, we lose runtime type safety. I think we should introduce these
> > classes, since ach of them would consist of one line and a few
> > imports: "trait PageNodeExtractor extends Mapping[PageNode]" is the
> > whole code for one class, "trait JsonNodeExtractor extends
> > Mapping[JsonNode]" is the next, etc. The class names I use here should
> > be improved.
> >
> > * If these separate subclasses existed, you wouldn't need the type
> > tags etc. because type erasure is not a problem anymore. When
> > CompositeExtractor gets a list of classes, it could check each class
> > if it is a subclass of PageNodeExtractor or JsonNodeExtractor and put
> > the instantiated objects into separate lists.
> >
> > See below for a bit more.
> >
> > On 24 November 2013 20:40, Hady elsahar <[email protected]> wrote:
> >> Hello All ,
> >>
> >> considering the issue #38  refactoring the core to accept new formats ,
> i
> >> guess the new core functionality is working now , what's needed is some
> >> modifications as well as your suggestions for updates and of course
> merging
> >> to the main branch
> >>
> >> what was done so far :
> >>
> >> 1- change Extractor Trait to accept [T] type argument [see commit]
> >> 2- change CompostiteExtractor class to load any type of classes not only
> >> PageNode [see commit]
> >>
> >> 3- Refactoring the core [see commit ]
> >>
> >> added  (loadToParsers) method to CompositeExtractor this method will :
> >>
> >> take a list of Extractors and split them by the type they accepts
> >> create JsonParseExtractor object and load it with Extractor[Json format]
> >> create WikiParseExtractor  object and load it with Extractor[PageNode]
> >> create CompositeExtractor object and load it with Extractor[WikiPage]
> >>
> >> Created ParseExtractor class which :
> >>
> >> takes WikiPageFormat  as an argument and decide suitable parser for it
> >> get loaded with Extractors
> >> in runtime check if page has proper WikiPageFormat if so ,parse it by
> the
> >> parse and pass it to all inner Extractors
> >> WikiParseExtractor , CompositeExtractor are instances of the same class
> >> ParseExtractor  but with different WikiPageFormat Argument
> >>
> >>
> >> Next Steps :
> >>
> >> 1- Loading WikiData Extractors created in the GSoC project to this
> branch
> >> 2- in CompositeExtractor , in order we check for  Extractor[T] , T is
> erased
> >> in runtime so the cleanest way is to use Scala TypeTag which need scala
> 2.10
> >> so :
> >>
> >> as a work around i added a Type enumerator at Extractor Class
> >> future work would be installing scala 2.10 , then replacing the enum
> with
> >> check for TypeTags
> >>
> >> 3- Get rid of the RootExtractor
> >>
> >> Questions:
> >> 1- Any suggestions or modifications needed ?
> >> 2- the only difference now than  JC's Design is that PraseExtractor
> passes
> >> WikiPage to all inner Extractor instead of collecting them in one
> >> CompositeExtractor
> >> it doesn't really add any new functionality just following the pattern
> . so
> >
> > Sorry, but such a statement makes me a bit angry. Yes, it doesn't add
> > functionality, it "just follows a pattern", but what does that mean?
> > That's a bit like saying, object-oriented programming doesn't really
> > add new functionality over procedural programming. Or C didn't really
> > add functionality, in the 1960s programmers got the job done with
> > machine code. That's all true, but code that is understandable and has
> > a clean structure is just as important (and often more important) than
> > functionality. "Separation of concerns" is one important keyword here.
> > In case you don't own the Design Patterns book, you should definitely
> > get it.
> >
> > But as Dimitris said, we can change this later. I just really think we
> should.
> >
> > JC
> >
> >> do you think we should add it ?
> >>
> >>
> >> thanks
> >> Regards
> >>
> >> -------------------------------------------------
> >> Hady El-Sahar
> >> Research Assistant
> >> Center of Informatics Sciences | Nile University
> >>
> >>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>



-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Reply via email to