Also, important categories like Computer Architechture, Human based
computation, Programming language theory, Software Engineering, and Theory
of Computation, are missing from the subcategories of Areas of Computer
Science.


*Regards,*
*Shubhanshu Mishra*
Research Assistant,
iSchool at University of Illinois at Urbana-Champaign
--------------------------------------------------
*Website:* http://shubhanshu.com
*LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra

Blog <http://shubhanshu.com/blog>  || Facebook
<http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
<http://www.twitter.com/TheShubhanshu>  || LinkedIn
<http://www.linkedin.com/in/shubhanshumishra>

On Wed, Nov 1, 2017 at 10:42 AM, Shubhanshu Mishra <
[email protected]> wrote:

> Hi,
>
> When using the wikipedia dump files, I am unable to find many categories
> and pages in the dump.
>
> E.g. under the Areas_of_computer_science category I get only 13
> subcategories and 2 pages instead of 17 subcategories, 2 pages.
> Furthermore, 1 page "Computational_creativity" is not present as a
> subcategory.
>
> I am using the following wikipedia dump files to extract the
> categorylinks, and page details:
>
> 1.6G Sep 21   00:45 enwiki-20170920-page.sql.gz
> 21M Sep 21    00:45 enwiki-20170920-category.sql.gz
> 113M Sep 21   00:55 enwiki-20170920-redirect.sql.gz
> 2.2G Sep 21   03:10 enwiki-20170920-categorylinks.sql.gz
> 221M Sep 21   03:13 enwiki-20170920-page_props.sql.gz
>
>
> I use https://github.com/napsternxg/WikiUtils to parse the sql.gz dump
> files, but I also tried searching in the sql.gz files and couldn't find any
> entry for 16300571 in the page.sql.gz and in category.sql.gz
> files. 16300571 supposedly refers to the Computational_creativity page as
> the following categories are linked to this page:
>
> 16300571 'All_NPOV_disputes'    'page'
> 16300571 'All_articles_needing_additional_references'   'page'
> 16300571 'All_articles_with_dead_external_links'        'page'
> 16300571 'All_articles_with_unsourced_statements'       'page'
> 16300571 'Areas_of_computer_science'    'page'
> 16300571 'Articles_needing_additional_references_from_May_2013' 'page'
> 16300571 'Articles_with_French-language_external_links' 'page'
> 16300571 'Articles_with_dead_external_links_from_November_2016' 'page'
> 16300571 'Articles_with_permanently_dead_external_links'        'page'
> 16300571 'Articles_with_unsourced_statements_from_April_2015'   'page'
> 16300571 'Articles_with_unsourced_statements_from_April_2016'   'page'
> 16300571 'Articles_with_unsourced_statements_from_December_2015'
> 'page'
> 16300571 'Articles_with_unsourced_statements_from_January_2010' 'page'
> 16300571 'Articles_with_unsourced_statements_from_October_2016' 'page'
> 16300571 'Artificial_intelligence'      'page'
> 16300571 'Arts' 'page'
> 16300571 'CS1_maint:_Extra_text:_authors_list'  'page'
> 16300571 'Cognitive_psychology' 'page'
> 16300571 'Computational_fields_of_study'        'page'
> 16300571 'Creativity_techniques'        'page'
> 16300571 'NPOV_disputes_from_January_2013'      'page'
> 16300571 'Philosophical_movements'      'page'
> 16300571 'Webarchive_template_wayback_links'    'page'
> 16300571 'Wikipedia_articles_needing_clarification_from_November_2008'
> 'page'
>
> More details can be found at: https://twitter.com/TheShubhanshu/status/
> 925736635572072449
>
> Is there something, I am doing wrong, or are these rows just missing from
> the dumps.
>
>
>
>
>
> *Regards,*
> *Shubhanshu Mishra*
> Research Assistant,
> iSchool at University of Illinois at Urbana-Champaign
> --------------------------------------------------
> *Website:* http://shubhanshu.com
> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra
>
> Blog <http://shubhanshu.com/blog>  || Facebook
> <http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
> <http://www.twitter.com/TheShubhanshu>  || LinkedIn
> <http://www.linkedin.com/in/shubhanshumishra>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to