[Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Hady elsahar Sun, 24 Nov 2013 11:40:48 -0800

Hello All ,

considering the issue
#38<https://github.com/dbpedia/extraction-framework/issues/38>
refactoring the core to accept new formats , i guess the new core
functionality is working now , what's needed is some modifications as well
as your suggestions for updates and of course merging to the main branch


what was done so far :

1- change Extractor Trait to accept [T] type argument [see
commit<https://github.com/hadyelsahar/extraction-framework/commit/e26ef813dad098d573be34191dfaef13c78b5986>
]
2- change CompostiteExtractor class to load any type of classes not only
PageNode [see 
commit<https://github.com/hadyelsahar/extraction-framework/commit/17dcaa8b2988e7fc8676532fa849fff1eabec9d0>
]

3- Refactoring the core [see commit
<https://github.com/hadyelsahar/extraction-framework/commit/9ad75cd864d12025d2872b4e3c6cbe4d4fae3681>
]

   - added  (loadToParsers) method to CompositeExtractor this method will :

   - take a list of Extractors and split them by the type they accepts
      - create JsonParseExtractor object and load it with Extractor[Json
      format]
      - create WikiParseExtractor  object and load it with
      Extractor[PageNode]
      - create CompositeExtractor object and load it with
      Extractor[WikiPage]

      - Created ParseExtractor class which :

   - takes WikiPageFormat  as an argument and decide suitable parser for it
      - get loaded with Extractors
      - in runtime check if page has proper WikiPageFormat if so ,parse it
      by the parse and pass it to all inner Extractors
      - WikiParseExtractor , CompositeExtractor are instances of the same
      class ParseExtractor  but with different WikiPageFormat Argument


*Next Steps : *

1- Loading WikiData Extractors created in the GSoC project to this branch
2- in CompositeExtractor , in order we check for  Extractor[T] , T is
erased in runtime so the cleanest way is to use Scala TypeTag which need
scala 2.10 so :

   - as a work around i added a Type enumerator at Extractor Class
   - future work would be installing scala 2.10 , then replacing the enum
   with check for TypeTags

3- Get rid of the RootExtractor

*Questions:*
1- Any suggestions or modifications needed ?
2- the only difference now than  JC's
Design<https://f.cloud.github.com/assets/607468/363286/1f8da62c-a1ff-11e2-99c3-bb5136accc07.png>
is
that PraseExtractor passes WikiPage to all inner Extractor instead of
collecting them in one CompositeExtractor
it doesn't really add any new functionality just following the pattern . so
do you think we should add it ?


thanks
Regards

-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

[Dbpedia-developers] Refactoring Core to accept new formats [Updates]

Reply via email to