Hello All ,
considering the issue
#38<https://github.com/dbpedia/extraction-framework/issues/38>
refactoring the core to accept new formats , i guess the new core
functionality is working now , what's needed is some modifications as well
as your suggestions for updates and of course merging to the main branch
what was done so far :
1- change Extractor Trait to accept [T] type argument [see
commit<https://github.com/hadyelsahar/extraction-framework/commit/e26ef813dad098d573be34191dfaef13c78b5986>
]
2- change CompostiteExtractor class to load any type of classes not only
PageNode [see
commit<https://github.com/hadyelsahar/extraction-framework/commit/17dcaa8b2988e7fc8676532fa849fff1eabec9d0>
]
3- Refactoring the core [see commit
<https://github.com/hadyelsahar/extraction-framework/commit/9ad75cd864d12025d2872b4e3c6cbe4d4fae3681>
]
- added (loadToParsers) method to CompositeExtractor this method will :
- take a list of Extractors and split them by the type they accepts
- create JsonParseExtractor object and load it with Extractor[Json
format]
- create WikiParseExtractor object and load it with
Extractor[PageNode]
- create CompositeExtractor object and load it with
Extractor[WikiPage]
- Created ParseExtractor class which :
- takes WikiPageFormat as an argument and decide suitable parser for it
- get loaded with Extractors
- in runtime check if page has proper WikiPageFormat if so ,parse it
by the parse and pass it to all inner Extractors
- WikiParseExtractor , CompositeExtractor are instances of the same
class ParseExtractor but with different WikiPageFormat Argument
*Next Steps : *
1- Loading WikiData Extractors created in the GSoC project to this branch
2- in CompositeExtractor , in order we check for Extractor[T] , T is
erased in runtime so the cleanest way is to use Scala TypeTag which need
scala 2.10 so :
- as a work around i added a Type enumerator at Extractor Class
- future work would be installing scala 2.10 , then replacing the enum
with check for TypeTags
3- Get rid of the RootExtractor
*Questions:*
1- Any suggestions or modifications needed ?
2- the only difference now than JC's
Design<https://f.cloud.github.com/assets/607468/363286/1f8da62c-a1ff-11e2-99c3-bb5136accc07.png>
is
that PraseExtractor passes WikiPage to all inner Extractor instead of
collecting them in one CompositeExtractor
it doesn't really add any new functionality just following the pattern . so
do you think we should add it ?
thanks
Regards
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers