Dario Taraborelli <dtaraborelli@...> writes:

> 
> what Greg said, Common Crawl is an excellent data source to answer 
these questions, see:
> 
> http://blog.commoncrawl.org/2015/04/announcing-the-common-crawl-index/
> http://blog.commoncrawl.org/2015/02/wikireverse-visualizing-reverse-
links-with-open-data/
> 
> for aggregate stats about referrals to individual articles by traffic 
and aggregated at domain level you
> mail also be interested in this dataset:
> 
> http://figshare.com/articles/Wikipedia_Clickstream/1305770
> 
> > On Dec 2, 2015, at 8:06 AM, Greg Lindahl <lindahl <at> pbm.com> 
wrote:
> > 
> > On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo) 
wrote:
> >> Edison Nica, 29/11/2015 16:56:
> >>> how many non-wikipedia pages point to a certain wikipedia page
> >> 
> >> I guess the only way we have to know this (other than grepping
> >> request logs for referrers, which would be quite a nightmare) is to
> >> access the Google Webmaster account for wikipedia.org (to which a
> >> couple employees had access, IIRC).
> > 
> > There are a couple of other ways to figure out inlinks:
> > 
> > * Common Crawl
> > * Commercial SEO services like Moz or Ahrefs
> > 
> > In the medium term the Internet Archive is going to be generating 
this
> > kind of link data as part of the Wayback Machine search engine 
effort.
> > 
> > And finally, Edison, counting the number of inlinks without
> > considering their rank or popularity will probably leave you
> > vulnerable to people orchestrating googlebombs. And you might want 
to
> > also know the anchortext, that's extremely valuable for search
> > indexing.
> > 
> > -- greg
> > 
> > 
> > 
> > _______________________________________________
> > Analytics mailing list
> > Analytics <at> lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> Dario Taraborelli  Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org •  <at> readermeter
> 
> _______________________________________________
> Analytics mailing list
> Analytics <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 

Thank you all for your replies, and I apologize for improper usage of 
English language (see 'no offence')

I built my first Wikipedia Search App a while ago, it is a test best for 
my Offline Search Engine, and it contains only Medical Related 
Information for now.

https://play.google.com/store/apps/details?
id=com.zeropii.publish.txt.medical (BTW, this app has no Permissions, 
and not tracking of what the user is searching, and watch out, the APK 
is 78MB, if you plan installing)

I am now building the second version, which will extend to full 
Wikipedia.

If everything works right, I have another 3-6 months until I will need 
the Analytics to improve the search.


BTW, if this is public information, what Search Engine do you use?
Do you use a custom one?
DO you use the Analytics to refine search?

My goal is to understand if Analytics could substantially improve my 
(Wikipedia) search engine or not.

Thank you again for your answers and pointers!

Edison Nica
www.0Pii.com
Edisonn at 0pii dot com
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to