Hi Alexandru,
Thanks for the clarification regarding all my queries.
I am familiar with java/scala and learning spark. Please, give me some more
warm-up tasks so that i become more familiar regarding the scope of the
project and understand it deeply.
Thanks
Sent with MailTrack
<https://mailtrack.io/install?source=signature&lang=en&[email protected]&idSignature=22>
On Fri, Mar 18, 2016 at 6:57 PM, Alexandru Todor <[email protected]>
wrote:
> Hi Pratyush,
>
> Welcome to the DBpedia GSoC mailing list,
>
> I will try to answer your questions:
>
> Do we add page visit count in DBpedia Ontology and synchronized it with
>> DBpedia live?
>
>
> Yes we should add a new property to each DBpedia Resource called
> wikipediaVisitCount or similar. This property will be hard coded for the
> time being, since it needs to be present for all Ontology classes and not
> for a specific one. The exact way we should represent the new property
> should be discussed on the dbpedia-ontology mailing list
>
> We can not synchronize this for now with DBpedia Live, since there is no
> update stream for the page counts as far as I know. We will automate the
> extraction so that it periodically polls the Wikipedia servers, and
> executes an extraction automatically when new information is available. We
> could then feed the output to DBpedia Live or any other endpoint, or just
> make the extracted dumps available
>
> Since, we have to extract page visit and click stream info from Wikipedia.
>> As, page visit counts are available in Wikimedia dumps
>> https://dumps.wikimedia.org/other/pagecounts-all-sites/ . So, can we
>> extract these dumps? and add it where we want. I can't able to open these
>> dumps because it takes lots of time to download.
>
>
> Yes, we download the dumps from Wikipedia. You don't need to download the
> entire dumps, just a couple of files are enough. You need to make your code
> be able to work with the full dump contents. I will execute the code for
> you on the full dumps.
>
> If you are familiar with Java and/or Scala and the corresponding Apache
> Spark API I can proceed to give you more serious warm-up tasks, that will
> make you better understand the scope of the project.
>
> Please help me on these issues and from where we can take help of mentors
>> to write proposal.
>>
>
> I am the mentor for this project and will help you write the Proposal. You
> can ask questions here or on the DBpedia ideas website.
>
> Cheers,
> Alexandru
>
>
> On Wed, Mar 16, 2016 at 7:49 AM, Pratyush Kumar <[email protected]>
> wrote:
>
>> Hi all,
>> I am Pratyush Kumar and I'm B.Tech student from IIT Roorkee, India.
>> I am interested in DBpedia project : Derived/Extra WikiPage Information
>> Extractor. For this, i have done all the warm up and recommended tasks. I
>> have good knowledge in Java, Scala and I am learning Apache Spark.
>> My certain queries are:
>> Do we add page visit count in DBpedia Ontology and synchronized it with
>> DBpedia live?
>> Since, we have to extract page visit and click stream info from
>> Wikipedia. As, page visit counts are available in Wikimedia dumps
>> https://dumps.wikimedia.org/other/pagecounts-all-sites/ . So, can we
>> extract these dumps? and add it where we want. I can't able to open these
>> dumps because it takes lots of time to download.
>> Please help me on these issues and from where we can take help of mentors
>> to write proposal.
>> Thanks.
>>
>>
>> --
>> With Regards,
>> Pratyush Kumar
>> IIT Roorkee
>> Mo: +91-7895395395
>>
>>
>>
>> Sent with MailTrack
>> <https://mailtrack.io/install?source=signature&lang=en&[email protected]&idSignature=22>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>>
>
--
With Regards,
Pratyush Kumar
IIT Roorkee
Mo: +91-7895395395
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc