Hi Pratyush,
Welcome to the DBpedia GSoC mailing list,
I will try to answer your questions:
Do we add page visit count in DBpedia Ontology and synchronized it with
> DBpedia live?
Yes we should add a new property to each DBpedia Resource called
wikipediaVisitCount or similar. This property will be hard coded for the
time being, since it needs to be present for all Ontology classes and not
for a specific one. The exact way we should represent the new property
should be discussed on the dbpedia-ontology mailing list
We can not synchronize this for now with DBpedia Live, since there is no
update stream for the page counts as far as I know. We will automate the
extraction so that it periodically polls the Wikipedia servers, and
executes an extraction automatically when new information is available. We
could then feed the output to DBpedia Live or any other endpoint, or just
make the extracted dumps available
Since, we have to extract page visit and click stream info from Wikipedia.
> As, page visit counts are available in Wikimedia dumps
> https://dumps.wikimedia.org/other/pagecounts-all-sites/ . So, can we
> extract these dumps? and add it where we want. I can't able to open these
> dumps because it takes lots of time to download.
Yes, we download the dumps from Wikipedia. You don't need to download the
entire dumps, just a couple of files are enough. You need to make your code
be able to work with the full dump contents. I will execute the code for
you on the full dumps.
If you are familiar with Java and/or Scala and the corresponding Apache
Spark API I can proceed to give you more serious warm-up tasks, that will
make you better understand the scope of the project.
Please help me on these issues and from where we can take help of mentors
> to write proposal.
>
I am the mentor for this project and will help you write the Proposal. You
can ask questions here or on the DBpedia ideas website.
Cheers,
Alexandru
On Wed, Mar 16, 2016 at 7:49 AM, Pratyush Kumar <[email protected]>
wrote:
> Hi all,
> I am Pratyush Kumar and I'm B.Tech student from IIT Roorkee, India.
> I am interested in DBpedia project : Derived/Extra WikiPage Information
> Extractor. For this, i have done all the warm up and recommended tasks. I
> have good knowledge in Java, Scala and I am learning Apache Spark.
> My certain queries are:
> Do we add page visit count in DBpedia Ontology and synchronized it with
> DBpedia live?
> Since, we have to extract page visit and click stream info from Wikipedia.
> As, page visit counts are available in Wikimedia dumps
> https://dumps.wikimedia.org/other/pagecounts-all-sites/ . So, can we
> extract these dumps? and add it where we want. I can't able to open these
> dumps because it takes lots of time to download.
> Please help me on these issues and from where we can take help of mentors
> to write proposal.
> Thanks.
>
>
> --
> With Regards,
> Pratyush Kumar
> IIT Roorkee
> Mo: +91-7895395395
>
>
>
> Sent with MailTrack
> <https://mailtrack.io/install?source=signature&lang=en&[email protected]&idSignature=22>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc