This is an automated email from the git hooks/post-receive script. tille pushed a commit to branch master in repository r-cran-wikidatar.
commit 9d0cd5b102b62ce9e38421e550407a60279cc818 Author: Andreas Tille <[email protected]> Date: Mon Oct 2 15:10:28 2017 +0200 New upstream version 1.4.0 --- DESCRIPTION | 21 ++++ LICENSE | 2 + MD5 | 29 +++++ NAMESPACE | 19 +++ NEWS | 33 +++++ R/WikidataR.R | 15 +++ R/geo.R | 182 +++++++++++++++++++++++++++ R/gets.R | 127 +++++++++++++++++++ R/prints.R | 125 +++++++++++++++++++ R/utils.R | 90 ++++++++++++++ README.md | 38 ++++++ build/vignette.rds | Bin 0 -> 211 bytes inst/doc/Introduction.R | 36 ++++++ inst/doc/Introduction.Rmd | 82 ++++++++++++ inst/doc/Introduction.html | 290 +++++++++++++++++++++++++++++++++++++++++++ man/WikidataR.Rd | 18 +++ man/extract_claims.Rd | 33 +++++ man/find_item.Rd | 41 ++++++ man/get_geo_box.Rd | 58 +++++++++ man/get_geo_entity.Rd | 58 +++++++++ man/get_item.Rd | 42 +++++++ man/get_random.Rd | 39 ++++++ man/print.find_item.Rd | 16 +++ man/print.find_property.Rd | 16 +++ man/print.wikidata.Rd | 19 +++ tests/testthat.R | 4 + tests/testthat/test_geo.R | 48 +++++++ tests/testthat/test_gets.R | 30 +++++ tests/testthat/test_search.R | 17 +++ vignettes/Introduction.Rmd | 82 ++++++++++++ 30 files changed, 1610 insertions(+) diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..c0657f9 --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,21 @@ +Package: WikidataR +Type: Package +Title: API Client Library for 'Wikidata' +Version: 1.4.0 +Date: 2017-09-21 +Author: Oliver Keyes [aut, cre], Serena Signorelli [aut, cre], + Christian Graul [ctb], Mikhail Popov [ctb] +Maintainer: Oliver Keyes <[email protected]> +Description: An API client for the Wikidata <http://wikidata.org/> store of + semantic data. +BugReports: https://github.com/Ironholds/WikidataR/issues +URL: https://github.com/Ironholds/WikidataR/issues +License: MIT + file LICENSE +Imports: httr, jsonlite, WikipediR (>= 1.4.0), utils +Suggests: testthat, knitr, pageviews +VignetteBuilder: knitr +RoxygenNote: 6.0.1 +NeedsCompilation: no +Packaged: 2017-09-22 02:22:59 UTC; ironholds +Repository: CRAN +Date/Publication: 2017-09-22 05:43:08 UTC diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..ebbb227 --- /dev/null +++ b/LICENSE @@ -0,0 +1,2 @@ +YEAR: 2014 +COPYRIGHT HOLDER: Oliver Keyes \ No newline at end of file diff --git a/MD5 b/MD5 new file mode 100644 index 0000000..3fd3d6c --- /dev/null +++ b/MD5 @@ -0,0 +1,29 @@ +eb02df461c648d4da3f983afc54503d5 *DESCRIPTION +1d9678dbfe1732b5d2c521e07b2ceef0 *LICENSE +8f5819571233c6d8d08d23f9bfc9979b *NAMESPACE +2776dd31c6533290c7fd2cd414a2b4bf *NEWS +e6967d650ab6b6462db1793f0fe5a46b *R/WikidataR.R +5ad80eca5081277b549234400a2dd7a3 *R/geo.R +4229fe3d75d444beb2fa00ae2bdcdba6 *R/gets.R +e588e32737791defc6f982114f39d75c *R/prints.R +c16306d76abfe6d0e78dd62bf77173c7 *R/utils.R +6095c718be80727c886cff790734e9b5 *README.md +a1d7177a65e4773e0c7fae2ccb9d143d *build/vignette.rds +43cc957bbe79bc0b25b62be190705064 *inst/doc/Introduction.R +5ab492a540df058a91940716bf3e9c4f *inst/doc/Introduction.Rmd +3e88344829cf501478fb9e8c841a18b5 *inst/doc/Introduction.html +dea44cd789a89155878f75eb0c430541 *man/WikidataR.Rd +ca32f05afde2f042aa5ce7c799d63976 *man/extract_claims.Rd +d6439bd1505303b2c069a9ec5a482346 *man/find_item.Rd +6486678f64813a107103352d076f2ed3 *man/get_geo_box.Rd +3d485c862e1ab25782c98cf2d2e8c009 *man/get_geo_entity.Rd +d44df4503eefe77a44f15be028f977b7 *man/get_item.Rd +9a902a02739165e2a862571144e74ebd *man/get_random.Rd +aa48f8096742e46ef2f78f0ec960b039 *man/print.find_item.Rd +5f88c4bb32c2b352ff1cab1f9124a982 *man/print.find_property.Rd +294468449f0be62ebd8443bd10e926be *man/print.wikidata.Rd +ced86f667bcd51239f1c0d5d5c1a492b *tests/testthat.R +8f0a71f6693281b0d26afe5532158157 *tests/testthat/test_geo.R +2986dd17d5e90976391d811f9bb3bb1c *tests/testthat/test_gets.R +38bd7da6e5db4b243603b6d0d53e8cdd *tests/testthat/test_search.R +5ab492a540df058a91940716bf3e9c4f *vignettes/Introduction.Rmd diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..a40769d --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,19 @@ +# Generated by roxygen2: do not edit by hand + +S3method(print,find_item) +S3method(print,find_property) +S3method(print,wikidata) +export(extract_claims) +export(find_item) +export(find_property) +export(get_geo_box) +export(get_geo_entity) +export(get_item) +export(get_property) +export(get_random_item) +export(get_random_property) +importFrom(WikipediR,page_content) +importFrom(WikipediR,query) +importFrom(WikipediR,random_page) +importFrom(httr,user_agent) +importFrom(jsonlite,fromJSON) diff --git a/NEWS b/NEWS new file mode 100644 index 0000000..84e9813 --- /dev/null +++ b/NEWS @@ -0,0 +1,33 @@ +1.4.0 +================================================= +* extract_claims() allows you to, well, extract claims. +* SPARQL syntax bug with some geo queries now fixed (thanks to Mikhail Popov) + +1.3.0 +================================================= +* get_* functions are now vectorised + +1.2.0 +================================================= +* geographic data for entities that exist relative to other Wikidata items can now be retrieved +with get_geo_entity and get_geo_box, courtesy of excellent Serena Signorelli's excellent +QueryWikidataR package. + +* A bug in printing returned objects is now fixed. + +1.1.0 +================================================= +* You can now retrieve multiple random properties or items with get_random_item and get_random_property + +1.0.1 +================================================= +* Various documentation and metadata improvements. + +1.0.0 +================================================= +* Fix a bug in get_* functions due to a parameter name mismatch +* Print methods added by Christian Graul + +0.5.0 +================================================= +* This is the initial release! See the explanatory vignettes. diff --git a/R/WikidataR.R b/R/WikidataR.R new file mode 100644 index 0000000..9614e67 --- /dev/null +++ b/R/WikidataR.R @@ -0,0 +1,15 @@ +#' @title API client library for Wikidata +#' @description This package serves as an API client for \href{Wikidata}{https://www.wikidata.org}. +#' See the accompanying vignette for more details. +#' +#' @name WikidataR +#' @docType package +#'@seealso \code{\link{get_random}} for selecting a random item or property, +#'\code{\link{get_item}} for a /specific/ item or property, or \code{\link{find_item}} +#'for using search functionality to pull out item or property IDs where the descriptions +#'or aliases match a particular search term. +#' @importFrom WikipediR page_content random_page query +#' @importFrom httr user_agent +#' @importFrom jsonlite fromJSON +#' @aliases WikidataR WikidataR-package +NULL \ No newline at end of file diff --git a/R/geo.R b/R/geo.R new file mode 100644 index 0000000..37efdff --- /dev/null +++ b/R/geo.R @@ -0,0 +1,182 @@ +clean_geo <- function(results){ + do.call("rbind", lapply(results, function(item){ + point <- unlist(strsplit(gsub(x = item$coord$value, pattern = "(Point\\(|\\))", replacement = ""), + " ")) + wd_id <- gsub(x = item$item$value, pattern = "http://www.wikidata.org/entity/", + replacement = "", fixed = TRUE) + return(data.frame(item = wd_id, + name = ifelse(item$name$value == wd_id, NA, item$name$value), + latitutde = as.numeric(point[1]), + longitude = as.numeric(point[2]), + stringsAsFactors = FALSE)) + + })) +} + +#'@title Retrieve geographic information from Wikidata +#'@description \code{get_geo_entity} retrieves the item ID, latitude +#'and longitude of any object with geographic data associated with \emph{another} +#'object with geographic data (example: all the locations around/near/associated with +#'a city). +#' +#'@param entity a Wikidata item (\code{Q...}) or series of items, to check +#'for associated geo-tagged items. +#' +#'@param language the two-letter language code to use for the name +#'of the item. "en" by default, because we're imperialist +#'anglocentric westerners. +#' +#'@param radius optionally, a radius (in kilometers) around \code{entity} +#'to restrict the search to. +#' +#'@param ... further arguments to pass to httr's GET. +#' +#'@return a data.frame of 5 columns: +#'\itemize{ +#' \item{item}{ the Wikidata identifier of each object associated with +#' \code{entity}.} +#' \item{name}{ the name of the item, if available, in the requested language. If it +#' is not available, \code{NA} will be returned instead.} +#' \item{latitude}{ the latitude of \code{item}} +#' \item{longitude}{ the longitude of \code{item}} +#' \item{entity}{ the entity the item is associated with (necessary for multi-entity +#' queries).} +#'} +#' +#'@examples +#'# All entities +#'sf_locations <- get_geo_entity("Q62") +#' +#'# Entities with French, rather than English, names +#'sf_locations <- get_geo_entity("Q62", language = "fr") +#' +#'# Entities within 1km +#'sf_close_locations <- get_geo_entity("Q62", radius = 1) +#' +#'# Multiple entities +#'multi_entity <- get_geo_entity(entity = c("Q62", "Q64")) +#' +#'@seealso \code{\link{get_geo_box}} for using a bounding box +#'rather than an unrestricted search or simple radius. +#' +#'@export +get_geo_entity <- function(entity, language = "en", radius = NULL, ...){ + + entity <- check_input(entity, "Q") + + if(is.null(radius)){ + query <- paste0("SELECT DISTINCT ?item ?name ?coord ?propertyLabel WHERE { + ?item wdt:P131* wd:", entity, ". ?item wdt:P625 ?coord . + SERVICE wikibase:label { + bd:serviceParam wikibase:language \"", language, "\" . + ?item rdfs:label ?name + } + } + ORDER BY ASC (?name)") + } else { + query <- paste0("SELECT ?item ?name ?coord + WHERE { + wd:", entity, " wdt:P625 ?mainLoc . + SERVICE wikibase:around { + ?item wdt:P625 ?coord . + bd:serviceParam wikibase:center ?mainLoc . + bd:serviceParam wikibase:radius \"", radius, + "\" . + } + SERVICE wikibase:label { + bd:serviceParam wikibase:language \"", language, "\" . + ?item rdfs:label ?name + } + } ORDER BY ASC (?name)") + } + + if(length(query) > 1){ + return(do.call("rbind", mapply(function(query, entity, ...){ + output <- clean_geo(sparql_query(query, ...)$results$bindings) + output$entity <- entity + return(output) + }, query = query, entity = entity, ..., SIMPLIFY = FALSE))) + } + output <- clean_geo(sparql_query(query)$results$bindings) + output$entity <- entity + return(output) +} + +#'@title Get geographic entities based on a bounding box +#'@description \code{get_geo_box} retrieves all geographic entities in +#'Wikidata that fall between a bounding box between two existing items +#'with geographic attributes (usually cities). +#' +#'@param first_city_code a Wikidata item, or series of items, to use for +#'one corner of the bounding box. +#' +#'@param first_corner the direction of \code{first_city_code} relative +#'to \code{city} (eg "NorthWest", "SouthEast"). +#' +#'@param second_city_code a Wikidata item, or series of items, to use for +#'one corner of the bounding box. +#' +#'@param second_corner the direction of \code{second_city_code} relative +#'to \code{city} (eg "NorthWest", "SouthEast"). +#' +#'@param language the two-letter language code to use for the name +#'of the item. "en" by default. +#' +#'@param ... further arguments to pass to httr's GET. +#' +#'@return a data.frame of 5 columns: +#'\itemize{ +#' \item{item}{ the Wikidata identifier of each object associated with +#' \code{entity}.} +#' \item{name}{ the name of the item, if available, in the requested language. If it +#' is not available, \code{NA} will be returned instead.} +#' \item{latitude}{ the latitude of \code{item}} +#' \item{longitude}{ the longitude of \code{item}} +#' \item{entity}{ the entity the item is associated with (necessary for multi-entity +#' queries).} +#'} +#' +#'@examples +#'# Simple bounding box +#'bruges_box <- WikidataR:::get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest") +#' +#'# Custom language +#'bruges_box_fr <- WikidataR:::get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest", +#' language = "fr") +#' +#'@seealso \code{\link{get_geo_entity}} for using an unrestricted search or simple radius, +#'rather than a bounding box. +#' +#'@export +get_geo_box <- function(first_city_code, first_corner, second_city_code, second_corner, + language = "en", ...){ + + # Input checks + first_city_code <- check_input(first_city_code, "Q") + second_city_code <- check_input(second_city_code, "Q") + + # Construct query + query <- paste0("SELECT ?item ?name ?coord WHERE { + wd:", first_city_code, " wdt:P625 ?Firstloc . + wd:", second_city_code, " wdt:P625 ?Secondloc . + SERVICE wikibase:box { + ?item wdt:P625 ?coord . + bd:serviceParam wikibase:corner", first_corner, " ?Firstloc . + bd:serviceParam wikibase:corner", second_corner, " ?Secondloc . + } + SERVICE wikibase:label { + bd:serviceParam wikibase:language \"", language, "\" . + ?item rdfs:label ?name + } + }ORDER BY ASC (?name)") + + # Vectorise if necessary, or not if not! + if(length(query) > 1){ + return(do.call("rbind", mapply(function(query, ...){ + output <- clean_geo(sparql_query(query, ...)$results$bindings) + return(output) + }, query = query, ..., SIMPLIFY = FALSE))) + } + output <- clean_geo(sparql_query(query)$results$bindings) + return(output) +} \ No newline at end of file diff --git a/R/gets.R b/R/gets.R new file mode 100644 index 0000000..2a9c5ce --- /dev/null +++ b/R/gets.R @@ -0,0 +1,127 @@ +#'@title Retrieve specific Wikidata items or properties +#'@description \code{get_item} and \code{get_property} allow you to retrieve the data associated +#'with individual Wikidata items and properties, respectively. As with +#'other \code{WikidataR} code, custom print methods are available; use \code{\link{str}} +#'to manipulate and see the underlying structure of the data. +#' +#'@param id the ID number(s) of the item or property you're looking for. This can be in +#'various formats; either a numeric value ("200"), the full name ("Q200") or +#'even with an included namespace ("Property:P10") - the function will format +#'it appropriately. This function is vectorised and will happily accept +#'multiple IDs. +#' +#'@param ... further arguments to pass to httr's GET. +#' +#'@seealso \code{\link{get_random}} for selecting a random item or property, +#'or \code{\link{find_item}} for using search functionality to pull out +#'item or property IDs where the descriptions or aliases match a particular +#'search term. +#' +#'@examples +#' +#'#Retrieve a specific item +#'adams_metadata <- get_item("42") +#' +#'#Retrieve a specific property +#'object_is_child <- get_property("P40") +#' +#'@aliases get_item get_property +#'@rdname get_item +#'@export +get_item <- function(id, ...){ + id <- check_input(id, "Q") + output <- (lapply(id, wd_query, ...)) + class(output) <- "wikidata" + return(output) +} + +#'@rdname get_item +#'@export +get_property <- function(id, ...){ + has_grep <- grepl("^P(?!r)",id, perl = TRUE) + id[has_grep] <- paste0("Property:", id[has_grep]) + id <- check_input(id, "Property:P") + + output <- (lapply(id, wd_query, ...)) + class(output) <- "wikidata" + return(output) +} + +#'@title Retrieve randomly-selected Wikidata items or properties +#'@description \code{get_random_item} and \code{get_random_property} allow you to retrieve the data +#'associated with randomly-selected Wikidata items and properties, respectively. As with +#'other \code{WikidataR} code, custom print methods are available; use \code{\link{str}} +#'to manipulate and see the underlying structure of the data. +#' +#'@param limit how many random items to return. 1 by default, but can be higher. +#' +#'@param ... arguments to pass to httr's GET. +#' +#'@seealso \code{\link{get_item}} for selecting a specific item or property, +#'or \code{\link{find_item}} for using search functionality to pull out +#'item or property IDs where the descriptions or aliases match a particular +#'search term. +#' +#'@examples +#' +#'#Random item +#'random_item <- get_random_item() +#' +#'#Random property +#'random_property <- get_random_property() +#' +#'@aliases get_random get_random_item get_random_property +#'@rdname get_random +#'@export +get_random_item <- function(limit = 1, ...){ + return(wd_rand_query(ns = 0, limit = limit, ...)) +} + +#'@rdname get_random +#'@export +get_random_property <- function(limit = 1, ...){ + return(wd_rand_query(ns = 120, limit = limit, ...)) +} + +#'@title Search for Wikidata items or properties that match a search term +#'@description \code{find_item} and \code{find_property} allow you to retrieve a set +#'of Wikidata items or properties where the aliase or descriptions match a particular +#'search term. As with other \code{WikidataR} code, custom print methods are available; +#'use \code{\link{str}} to manipulate and see the underlying structure of the data. +#' +#'@param search_term a term to search for. +#' +#'@param language the language to return the labels and descriptions in; this should +#'consist of an ISO language code. Set to "en" by default. +#' +#'@param limit the number of results to return; set to 10 by default. +#' +#'@param ... further arguments to pass to httr's GET. +#' +#'@seealso \code{\link{get_random}} for selecting a random item or property, +#'or \code{\link{get_item}} for selecting a specific item or property. +#' +#'@examples +#' +#'#Check for entries relating to Douglas Adams in some way +#'adams_items <- find_item("Douglas Adams") +#' +#'#Check for properties involving the peerage +#'peerage_props <- find_property("peerage") +#' +#'@aliases find_item find_property +#'@rdname find_item +#'@export +find_item <- function(search_term, language = "en", limit = 10, ...){ + res <- searcher(search_term, language, limit, "item") + class(res) <- "find_item" + return(res) +} + +#'@rdname find_item +#'@export +find_property <- function(search_term, language = "en", limit = 10){ + res <- searcher(search_term, language, limit, "property") + class(res) <- "find_property" + return(res) +} diff --git a/R/prints.R b/R/prints.R new file mode 100644 index 0000000..261287d --- /dev/null +++ b/R/prints.R @@ -0,0 +1,125 @@ +#'@title Print method for find_item +#' +#'@description print found items. +#' +#'@param x find_item object with search results +#'@param \dots Arguments to be passed to methods +#' +#'@method print find_item +#'@export +print.find_item <- function(x, ...) { + cat("\n\tWikidata item search\n\n") + + # number of results + num_results <- length(x) + cat("Number of results:\t", num_results, "\n\n") + + # results + if(num_results > 0) { + cat("Results:\n") + for(i in 1:num_results) { + if(is.null(x[[i]]$description)){ + desc <- "\n" + } + else { + desc <- paste("-", x[[i]]$description, "\n") + } + cat(i, "\t", x[[i]]$label, paste0("(", x[[i]]$id, ")"), desc) + } + } +} + +#'@title Print method for find_property +#' +#'@description print found properties. +#' +#'@param x find_property object with search results +#'@param \dots Arguments to be passed to methods +#' +#'@method print find_property +#'@export +print.find_property <- function(x, ...) { + cat("\n\tWikidata property search\n\n") + + # number of results + num_results <- length(x) + cat("Number of results:\t", num_results, "\n\n") + + # results + if(num_results > 0) { + cat("Results:\n") + for(i in seq_len(num_results)) { + if(is.null(x[[i]]$description)){ + desc <- "\n" + } + else { + desc <- paste("-", x[[i]]$description, "\n") + } + cat(i, "\t", x[[i]]$label, paste0("(", x[[i]]$id, ")"), desc) + } + } +} + +wd_print_base <- function(x, ...){ + + cat("\n\tWikidata", x$type, x$id, "\n\n") + + # labels + num.labels <- length(x$labels) + if(num.labels>0) { + lbl <- x$labels[[1]]$value + if(num.labels==1) cat("Label:\t\t", lbl, "\n") + else { + if(!is.null(x$labels$en)) lbl <- x$labels$en$value + cat("Label:\t\t", lbl, paste0("\t[", num.labels-1, " other languages available]\n")) + } + } + + # aliases + num_aliases <- length(x$aliases) + if(num_aliases > 0) { + al <- unique(unlist(lapply(x$aliases, function(xl){return(xl$value)}))) + cat("Aliases:\t", paste(al, collapse = ", "), "\n") + } + + # descriptions + num_desc <- length(x$descriptions) + if(num_desc > 0) { + desc <- x$descriptions[[1]]$value + if(num_desc == 1){ + cat("Description:", desc, "\n") + } + else { + if(!is.null(x$descriptions$en)){ + desc <- x$descriptions$en$value + } + cat("Description:", desc, paste0("\t[", (num_desc - 1), " other languages available]\n")) + } + } + + # num claims + num_claims <- length(x$claims) + if(num_claims > 0){ + cat("Claims:\t\t", num_claims, "\n") + } + + # num sitelinks + num_links <- length(x$sitelinks) + if(num_links > 0){ + cat("Sitelinks:\t", num_links, "\n") + } +} + +#'@title Print method for Wikidata objects +#' +#'@description print found objects generally. +#' +#'@param x wikidata object from get_item, get_random_item, get_property or get_random_property +#'@param \dots Arguments to be passed to methods +#'@seealso get_item, get_random_item, get_property or get_random_property +#'@method print wikidata +#'@export +print.wikidata <- function(x, ...){ + lapply(x, wd_print_base, ...) + return(invisible()) +} \ No newline at end of file diff --git a/R/utils.R b/R/utils.R new file mode 100644 index 0000000..ffde3cc --- /dev/null +++ b/R/utils.R @@ -0,0 +1,90 @@ +#Generic queryin' function for direct Wikidata calls. Wraps around WikipediR::page_content. +wd_query <- function(title, ...){ + result <- WikipediR::page_content(domain = "wikidata.org", page_name = title, as_wikitext = TRUE, + httr::user_agent("WikidataR - https://github.com/Ironholds/WikidataR"), + ...) + output <- jsonlite::fromJSON(result$parse$wikitext[[1]]) + return(output) +} + +#Query for a random item in "namespace" (ns). Essentially a wrapper around WikipediR::random_page. +wd_rand_query <- function(ns, limit, ...){ + result <- WikipediR::random_page(domain = "wikidata.org", as_wikitext = TRUE, namespaces = ns, + httr::user_agent("WikidataR - https://github.com/Ironholds/WikidataR"), + limit = limit, ...) + output <- lapply(result, function(x){jsonlite::fromJSON(x$wikitext[[1]])}) + class(output) <- "wikidata" + return(output) + +} + +#Generic input checker. Needs additional stuff for property-based querying +#because namespaces are weird, yo. +check_input <- function(input, substitution){ + in_fit <- grepl("^\\d+$",input) + if(any(in_fit)){ + input[in_fit] <- paste0(substitution, input[in_fit]) + } + return(input) +} + +#Generic, direct access to Wikidata's search functionality. +searcher <- function(search_term, language, limit, type, ...){ + result <- WikipediR::query(url = "https://www.wikidata.org/w/api.php", out_class = "list", clean_response = FALSE, + query_param = list( + action = "wbsearchentities", + type = type, + language = language, + limit = limit, + search = search_term + ), + ...) + result <- result$search + return(result) +} + +sparql_query <- function(params, ...){ + result <- httr::GET("https://query.wikidata.org/bigdata/namespace/wdq/sparql", + query = list(query = params), + httr::user_agent("WikidataR - https://github.com/Ironholds/WikidataR"), + ...) + httr::stop_for_status(result) + return(httr::content(result, as = "parsed", type = "application/json")) +} + +#'@title Extract Claims from Returned Item Data +#'@description extract claim information from data returned using +#'\code{\link{get_item}}. +#' +#'@param items a list of one or more Wikidata items returned with +#'\code{\link{get_item}}. +#' +#'@param claims a vector of claims (in the form "P321", "P12") to look for +#'and extract. +#' +#'@return a list containing one sub-list for each entry in \code{items}, +#'and (below that) the found data for each claim. In the event a claim +#'cannot be found for an item, an \code{NA} will be returned +#'instead. +#' +#'@examples +#'# Get item data +#'adams_data <- get_item("42") +#' +#'# Get claim data +#'claims <- extract_claims(adams_data, "P31") +#' +#'@export +extract_claims <- function(items, claims){ + output <- lapply(items, function(x, claims){ + return(lapply(claims, function(claim, obj){ + which_match <- which(names(obj$claims) == claim) + if(!length(which_match)){ + return(NA) + } + return(obj$claims[[which_match[1]]]) + }, obj = x)) + }, claims = claims) + + return(output) +} diff --git a/README.md b/README.md new file mode 100644 index 0000000..e38a002 --- /dev/null +++ b/README.md @@ -0,0 +1,38 @@ +WikidataR +========= + +An R API wrapper for the Wikidata store of semantic data. + +__Author:__ Oliver Keyes, Serena Signorelli & Christian Graul<br/> +__License:__ [MIT](http://opensource.org/licenses/MIT)<br/> +__Status:__ Stable + +[](https://travis-ci.org/Ironholds/WikidataR) + +Description +====== +WikidataR is a wrapper around the Wikidata API. It is written in and for R, and was inspired by Christian Graul's +[rwikidata](https://github.com/chgrl/rwikidata) project. For details on how to best use it, see the [explanatory +vignette](https://CRAN.R-project.org/package=WikidataR/vignettes/Introduction.html). + +Please note that this project is released with a +[Contributor Code of Conduct](https://github.com/Ironholds/WikidataR/blob/master/CONDUCT.md). +By participating in this project you agree to abide by its terms. + +Installation +====== + +For the most recent CRAN version: + + install.packages("WikidataR") + +For the development version: + + library(devtools) + devtools::install_github("ironholds/WikidataR") + +Dependencies +====== +* R. Doy. +* [httr](https://cran.r-project.org/package=httr) and its dependencies. +* [WikipediR](https://cran.r-project.org/package=WikipediR) diff --git a/build/vignette.rds b/build/vignette.rds new file mode 100644 index 0000000..ca9dd2f Binary files /dev/null and b/build/vignette.rds differ diff --git a/inst/doc/Introduction.R b/inst/doc/Introduction.R new file mode 100644 index 0000000..652eec9 --- /dev/null +++ b/inst/doc/Introduction.R @@ -0,0 +1,36 @@ +## ---- eval=FALSE--------------------------------------------------------- +# #Retrieve an item +# item <- get_item(id = 1) +# +# #Get information about the property of the first claim it has. +# first_claim <- get_property(id = names(item$claims)[1]) +# #Do we succeed? Dewey! + +## ---- eval=FALSE--------------------------------------------------------- +# #Retrieve a random item +# rand_item <- get_random_item() +# +# #Retrieve a random property +# rand_prop <- get_random_property() + +## ---- eval=FALSE--------------------------------------------------------- +# #Retrieve 42 random items +# rand_item <- get_random_item(limit = 42) +# +# #Retrieve 42 random properties +# rand_prop <- get_random_property(limit = 42) + +## ---- eval=FALSE--------------------------------------------------------- +# #Find item - find defaults to "en" as a language. +# aarons <- find_item("Aaron Halfaker") +# +# #Find a property - also defaults to "en" +# first_names <- find_property("first name") + +## ---- eval=FALSE--------------------------------------------------------- +# #Find item. +# all_aarons <- find_item("Aaron Halfaker") +# +# #Grab the ID code for the first entry and retrieve the associated item data. +# first_aaron <- get_item(all_aarons[[1]]$id) + diff --git a/inst/doc/Introduction.Rmd b/inst/doc/Introduction.Rmd new file mode 100644 index 0000000..e22aae2 --- /dev/null +++ b/inst/doc/Introduction.Rmd @@ -0,0 +1,82 @@ +<!-- +%\VignetteEngine{knitr::knitr} +%\VignetteIndexEntry{Introduction to WikidataR} +--> + +# WikidataR: the API client library for Wikidata +Wikidata is a wonderful and irreplaceable resource for linked data, containing information on pretty much any subject. If there's a Wikipedia article on it, there's almost certainly a Wikidata item for it. + +<code>WikidataR</code> - following the naming scheme of [WikipediR](https://github.com/Ironholds/WikipediR#thanks-and-misc) - is an API client library for Wikidata, written in and accessible from R. + +## Items and properties +The two basic component pieces of Wikidata are "items" and "properties". An "item" is a thing - a concept, object or +topic that exists in the real world, such as "Rush". These items each have statements associated with them - for +example, "Rush is an instance of: Rock Band". In that statement, "Rock Band" is a property: a class or trait +that items can hold. Wikidata items are organised as descriptors of the item, in various languages, and references to the properties that that item holds. + +## Retrieving specific items or properties +Items and properties are both identified by numeric IDs, prefaced with "Q" in the case of items, +and "P" in the case of properties. WikipediR can be used to retrieve items or properties with specific +ID numbers, using the <code>get\_item</code> and <code>get\_property</code> functions: + +```{r, eval=FALSE} +#Retrieve an item +item <- get_item(id = 1) + +#Get information about the property of the first claim it has. +first_claim <- get_property(id = names(item$claims)[1]) +#Do we succeed? Dewey! +``` + +These functions are capable of accepting various forms for the ID, including (as examples), "Q100" or "100" +for items, and "Property:P100", "P100" or "100" for properties. They're also vectorised - pass them as many IDs as you want! + +## Retrieving randomly-selected items or properties +As well as retrieving specific items or properties, Wikidata's API also allows for the retrieval of *random* +elements. With WikidataR, this can be achieved through: + +```{r, eval=FALSE} +#Retrieve a random item +rand_item <- get_random_item() + +#Retrieve a random property +rand_prop <- get_random_property() +``` + +These also allow you to retrieve *sets* of random elements - not just one at a time, but say, 50 at a time - by including the "limit" argument: + +```{r, eval=FALSE} +#Retrieve 42 random items +rand_item <- get_random_item(limit = 42) + +#Retrieve 42 random properties +rand_prop <- get_random_property(limit = 42) +``` + +## Search +Wikidata's search functionality can also be used, either to find items or to find properties. All you need is +a search string (which is run over the names and descriptions of items or properties) and a language code +(since Wikidata's descriptions can be in many languages): + +```{r, eval=FALSE} +#Find item - find defaults to "en" as a language. +aarons <- find_item("Aaron Halfaker") + +#Find a property - also defaults to "en" +first_names <- find_property("first name") +``` + +The resulting search entries have the ID as a key, making it trivial to then retrieve the full corresponding +items or properties: + +```{r, eval=FALSE} +#Find item. +all_aarons <- find_item("Aaron Halfaker") + +#Grab the ID code for the first entry and retrieve the associated item data. +first_aaron <- get_item(all_aarons[[1]]$id) +``` + +## Other and future functionality +If you have ideas for other types of useful Wikidata access, the best approach +is to either [request it](https://github.com/Ironholds/WikidataR/issues) or [add it](https://github.com/Ironholds/WikidataR/pulls)! diff --git a/inst/doc/Introduction.html b/inst/doc/Introduction.html new file mode 100644 index 0000000..9bdad3a --- /dev/null +++ b/inst/doc/Introduction.html @@ -0,0 +1,290 @@ +<!DOCTYPE html> +<html> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> + +<title>WikidataR: the API client library for Wikidata</title> + +<script type="text/javascript"> +window.onload = function() { + var imgs = document.getElementsByTagName('img'), i, img; + for (i = 0; i < imgs.length; i++) { + img = imgs[i]; + // center an image if it is the only element of its parent + if (img.parentElement.childElementCount === 1) + img.parentElement.style.textAlign = 'center'; + } +}; +</script> + +<!-- Styles for R syntax highlighter --> +<style type="text/css"> + pre .operator, + pre .paren { + color: rgb(104, 118, 135) + } + + pre .literal { + color: #990073 + } + + pre .number { + color: #099; + } + + pre .comment { + color: #998; + font-style: italic + } + + pre .keyword { + color: #900; + font-weight: bold + } + + pre .identifier { + color: rgb(0, 0, 0); + } + + pre .string { + color: #d14; + } +</style> + +<!-- R syntax highlighter --> +<script type="text/javascript"> +var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/</gm,"<")}function f(r,q,p){return RegExp(q,"m"+(r.cI?"i":"")+(p?"g":""))}function b(r){for(var p=0;p<r.childNodes.length;p++){var q=r.childNodes[p];if(q.nodeName=="CODE"){return q}if(!(q.nodeType==3&&q.nodeValue.match(/\s+/))){break}}}function h(t,s){var p="";for(var r=0;r<t.childNodes.length;r++){if(t.childNodes[r].nodeType==3){var q=t.childNodes[r].nodeValue;if(s){q=q.replace(/\n/g,"")}p+=q}else{if(t.chi [...] +hljs.initHighlightingOnLoad(); +</script> + + + +<style type="text/css"> +body, td { + font-family: sans-serif; + background-color: white; + font-size: 13px; +} + +body { + max-width: 800px; + margin: auto; + padding: 1em; + line-height: 20px; +} + +tt, code, pre { + font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; +} + +h1 { + font-size:2.2em; +} + +h2 { + font-size:1.8em; +} + +h3 { + font-size:1.4em; +} + +h4 { + font-size:1.0em; +} + +h5 { + font-size:0.9em; +} + +h6 { + font-size:0.8em; +} + +a:visited { + color: rgb(50%, 0%, 50%); +} + +pre, img { + max-width: 100%; +} +pre { + overflow-x: auto; +} +pre code { + display: block; padding: 0.5em; +} + +code { + font-size: 92%; + border: 1px solid #ccc; +} + +code[class] { + background-color: #F8F8F8; +} + +table, td, th { + border: none; +} + +blockquote { + color:#666666; + margin:0; + padding-left: 1em; + border-left: 0.5em #EEE solid; +} + +hr { + height: 0px; + border-bottom: none; + border-top-width: thin; + border-top-style: dotted; + border-top-color: #999999; +} + +@media print { + * { + background: transparent !important; + color: black !important; + filter:none !important; + -ms-filter: none !important; + } + + body { + font-size:12pt; + max-width:100%; + } + + a, a:visited { + text-decoration: underline; + } + + hr { + visibility: hidden; + page-break-before: always; + } + + pre, blockquote { + padding-right: 1em; + page-break-inside: avoid; + } + + tr, img { + page-break-inside: avoid; + } + + img { + max-width: 100% !important; + } + + @page :left { + margin: 15mm 20mm 15mm 10mm; + } + + @page :right { + margin: 15mm 10mm 15mm 20mm; + } + + p, h2, h3 { + orphans: 3; widows: 3; + } + + h2, h3 { + page-break-after: avoid; + } +} +</style> + + + +</head> + +<body> +<!-- +%\VignetteEngine{knitr::knitr} +%\VignetteIndexEntry{Introduction to WikidataR} +--> + +<h1>WikidataR: the API client library for Wikidata</h1> + +<p>Wikidata is a wonderful and irreplaceable resource for linked data, containing information on pretty much any subject. If there's a Wikipedia article on it, there's almost certainly a Wikidata item for it.</p> + +<p><code>WikidataR</code> - following the naming scheme of <a href="https://github.com/Ironholds/WikipediR#thanks-and-misc">WikipediR</a> - is an API client library for Wikidata, written in and accessible from R.</p> + +<h2>Items and properties</h2> + +<p>The two basic component pieces of Wikidata are “items” and “properties”. An “item” is a thing - a concept, object or +topic that exists in the real world, such as “Rush”. These items each have statements associated with them - for +example, “Rush is an instance of: Rock Band”. In that statement, “Rock Band” is a property: a class or trait +that items can hold. Wikidata items are organised as descriptors of the item, in various languages, and references to the properties that that item holds.</p> + +<h2>Retrieving specific items or properties</h2> + +<p>Items and properties are both identified by numeric IDs, prefaced with “Q” in the case of items, +and “P” in the case of properties. WikipediR can be used to retrieve items or properties with specific +ID numbers, using the <code>get_item</code> and <code>get_property</code> functions:</p> + +<pre><code class="r">#Retrieve an item +item <- get_item(id = 1) + +#Get information about the property of the first claim it has. +first_claim <- get_property(id = names(item$claims)[1]) +#Do we succeed? Dewey! +</code></pre> + +<p>These functions are capable of accepting various forms for the ID, including (as examples), “Q100” or “100” +for items, and “Property:P100”, “P100” or “100” for properties. They're also vectorised - pass them as many IDs as you want!</p> + +<h2>Retrieving randomly-selected items or properties</h2> + +<p>As well as retrieving specific items or properties, Wikidata's API also allows for the retrieval of <em>random</em> +elements. With WikidataR, this can be achieved through:</p> + +<pre><code class="r">#Retrieve a random item +rand_item <- get_random_item() + +#Retrieve a random property +rand_prop <- get_random_property() +</code></pre> + +<p>These also allow you to retrieve <em>sets</em> of random elements - not just one at a time, but say, 50 at a time - by including the “limit” argument:</p> + +<pre><code class="r">#Retrieve 42 random items +rand_item <- get_random_item(limit = 42) + +#Retrieve 42 random properties +rand_prop <- get_random_property(limit = 42) +</code></pre> + +<h2>Search</h2> + +<p>Wikidata's search functionality can also be used, either to find items or to find properties. All you need is +a search string (which is run over the names and descriptions of items or properties) and a language code +(since Wikidata's descriptions can be in many languages):</p> + +<pre><code class="r">#Find item - find defaults to "en" as a language. +aarons <- find_item("Aaron Halfaker") + +#Find a property - also defaults to "en" +first_names <- find_property("first name") +</code></pre> + +<p>The resulting search entries have the ID as a key, making it trivial to then retrieve the full corresponding +items or properties:</p> + +<pre><code class="r">#Find item. +all_aarons <- find_item("Aaron Halfaker") + +#Grab the ID code for the first entry and retrieve the associated item data. +first_aaron <- get_item(all_aarons[[1]]$id) +</code></pre> + +<h2>Other and future functionality</h2> + +<p>If you have ideas for other types of useful Wikidata access, the best approach +is to either <a href="https://github.com/Ironholds/WikidataR/issues">request it</a> or <a href="https://github.com/Ironholds/WikidataR/pulls">add it</a>!</p> + +</body> + +</html> diff --git a/man/WikidataR.Rd b/man/WikidataR.Rd new file mode 100644 index 0000000..2ea9768 --- /dev/null +++ b/man/WikidataR.Rd @@ -0,0 +1,18 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/WikidataR.R +\docType{package} +\name{WikidataR} +\alias{WikidataR} +\alias{WikidataR-package} +\alias{WikidataR-package} +\title{API client library for Wikidata} +\description{ +This package serves as an API client for \href{Wikidata}{https://www.wikidata.org}. +See the accompanying vignette for more details. +} +\seealso{ +\code{\link{get_random}} for selecting a random item or property, +\code{\link{get_item}} for a /specific/ item or property, or \code{\link{find_item}} +for using search functionality to pull out item or property IDs where the descriptions +or aliases match a particular search term. +} diff --git a/man/extract_claims.Rd b/man/extract_claims.Rd new file mode 100644 index 0000000..7f32f98 --- /dev/null +++ b/man/extract_claims.Rd @@ -0,0 +1,33 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{extract_claims} +\alias{extract_claims} +\title{Extract Claims from Returned Item Data} +\usage{ +extract_claims(items, claims) +} +\arguments{ +\item{items}{a list of one or more Wikidata items returned with +\code{\link{get_item}}.} + +\item{claims}{a vector of claims (in the form "P321", "P12") to look for +and extract.} +} +\value{ +a list containing one sub-list for each entry in \code{items}, +and (below that) the found data for each claim. In the event a claim +cannot be found for an item, an \code{NA} will be returned +instead. +} +\description{ +extract claim information from data returned using +\code{\link{get_item}}. +} +\examples{ +# Get item data +adams_data <- get_item("42") + +# Get claim data +claims <- extract_claims(adams_data, "P31") + +} diff --git a/man/find_item.Rd b/man/find_item.Rd new file mode 100644 index 0000000..d2af2bd --- /dev/null +++ b/man/find_item.Rd @@ -0,0 +1,41 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/gets.R +\name{find_item} +\alias{find_item} +\alias{find_property} +\alias{find_property} +\title{Search for Wikidata items or properties that match a search term} +\usage{ +find_item(search_term, language = "en", limit = 10, ...) + +find_property(search_term, language = "en", limit = 10) +} +\arguments{ +\item{search_term}{a term to search for.} + +\item{language}{the language to return the labels and descriptions in; this should +consist of an ISO language code. Set to "en" by default.} + +\item{limit}{the number of results to return; set to 10 by default.} + +\item{...}{further arguments to pass to httr's GET.} +} +\description{ +\code{find_item} and \code{find_property} allow you to retrieve a set +of Wikidata items or properties where the aliase or descriptions match a particular +search term. As with other \code{WikidataR} code, custom print methods are available; +use \code{\link{str}} to manipulate and see the underlying structure of the data. +} +\examples{ + +#Check for entries relating to Douglas Adams in some way +adams_items <- find_item("Douglas Adams") + +#Check for properties involving the peerage +peerage_props <- find_property("peerage") + +} +\seealso{ +\code{\link{get_random}} for selecting a random item or property, +or \code{\link{get_item}} for selecting a specific item or property. +} diff --git a/man/get_geo_box.Rd b/man/get_geo_box.Rd new file mode 100644 index 0000000..899c005 --- /dev/null +++ b/man/get_geo_box.Rd @@ -0,0 +1,58 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/geo.R +\name{get_geo_box} +\alias{get_geo_box} +\title{Get geographic entities based on a bounding box} +\usage{ +get_geo_box(first_city_code, first_corner, second_city_code, second_corner, + language = "en", ...) +} +\arguments{ +\item{first_city_code}{a Wikidata item, or series of items, to use for +one corner of the bounding box.} + +\item{first_corner}{the direction of \code{first_city_code} relative +to \code{city} (eg "NorthWest", "SouthEast").} + +\item{second_city_code}{a Wikidata item, or series of items, to use for +one corner of the bounding box.} + +\item{second_corner}{the direction of \code{second_city_code} relative +to \code{city} (eg "NorthWest", "SouthEast").} + +\item{language}{the two-letter language code to use for the name +of the item. "en" by default.} + +\item{...}{further arguments to pass to httr's GET.} +} +\value{ +a data.frame of 5 columns: +\itemize{ + \item{item}{ the Wikidata identifier of each object associated with + \code{entity}.} + \item{name}{ the name of the item, if available, in the requested language. If it + is not available, \code{NA} will be returned instead.} + \item{latitude}{ the latitude of \code{item}} + \item{longitude}{ the longitude of \code{item}} + \item{entity}{ the entity the item is associated with (necessary for multi-entity + queries).} +} +} +\description{ +\code{get_geo_box} retrieves all geographic entities in +Wikidata that fall between a bounding box between two existing items +with geographic attributes (usually cities). +} +\examples{ +# Simple bounding box +bruges_box <- WikidataR:::get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest") + +# Custom language +bruges_box_fr <- WikidataR:::get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest", + language = "fr") + +} +\seealso{ +\code{\link{get_geo_entity}} for using an unrestricted search or simple radius, +rather than a bounding box. +} diff --git a/man/get_geo_entity.Rd b/man/get_geo_entity.Rd new file mode 100644 index 0000000..ccec09e --- /dev/null +++ b/man/get_geo_entity.Rd @@ -0,0 +1,58 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/geo.R +\name{get_geo_entity} +\alias{get_geo_entity} +\title{Retrieve geographic information from Wikidata} +\usage{ +get_geo_entity(entity, language = "en", radius = NULL, ...) +} +\arguments{ +\item{entity}{a Wikidata item (\code{Q...}) or series of items, to check +for associated geo-tagged items.} + +\item{language}{the two-letter language code to use for the name +of the item. "en" by default, because we're imperialist +anglocentric westerners.} + +\item{radius}{optionally, a radius (in kilometers) around \code{entity} +to restrict the search to.} + +\item{...}{further arguments to pass to httr's GET.} +} +\value{ +a data.frame of 5 columns: +\itemize{ + \item{item}{ the Wikidata identifier of each object associated with + \code{entity}.} + \item{name}{ the name of the item, if available, in the requested language. If it + is not available, \code{NA} will be returned instead.} + \item{latitude}{ the latitude of \code{item}} + \item{longitude}{ the longitude of \code{item}} + \item{entity}{ the entity the item is associated with (necessary for multi-entity + queries).} +} +} +\description{ +\code{get_geo_entity} retrieves the item ID, latitude +and longitude of any object with geographic data associated with \emph{another} +object with geographic data (example: all the locations around/near/associated with +a city). +} +\examples{ +# All entities +sf_locations <- get_geo_entity("Q62") + +# Entities with French, rather than English, names +sf_locations <- get_geo_entity("Q62", language = "fr") + +# Entities within 1km +sf_close_locations <- get_geo_entity("Q62", radius = 1) + +# Multiple entities +multi_entity <- get_geo_entity(entity = c("Q62", "Q64")) + +} +\seealso{ +\code{\link{get_geo_box}} for using a bounding box +rather than an unrestricted search or simple radius. +} diff --git a/man/get_item.Rd b/man/get_item.Rd new file mode 100644 index 0000000..830d4f5 --- /dev/null +++ b/man/get_item.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/gets.R +\name{get_item} +\alias{get_item} +\alias{get_property} +\alias{get_property} +\title{Retrieve specific Wikidata items or properties} +\usage{ +get_item(id, ...) + +get_property(id, ...) +} +\arguments{ +\item{id}{the ID number(s) of the item or property you're looking for. This can be in +various formats; either a numeric value ("200"), the full name ("Q200") or +even with an included namespace ("Property:P10") - the function will format +it appropriately. This function is vectorised and will happily accept +multiple IDs.} + +\item{...}{further arguments to pass to httr's GET.} +} +\description{ +\code{get_item} and \code{get_property} allow you to retrieve the data associated +with individual Wikidata items and properties, respectively. As with +other \code{WikidataR} code, custom print methods are available; use \code{\link{str}} +to manipulate and see the underlying structure of the data. +} +\examples{ + +#Retrieve a specific item +adams_metadata <- get_item("42") + +#Retrieve a specific property +object_is_child <- get_property("P40") + +} +\seealso{ +\code{\link{get_random}} for selecting a random item or property, +or \code{\link{find_item}} for using search functionality to pull out +item or property IDs where the descriptions or aliases match a particular +search term. +} diff --git a/man/get_random.Rd b/man/get_random.Rd new file mode 100644 index 0000000..7edfaf1 --- /dev/null +++ b/man/get_random.Rd @@ -0,0 +1,39 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/gets.R +\name{get_random_item} +\alias{get_random_item} +\alias{get_random} +\alias{get_random_property} +\alias{get_random_property} +\title{Retrieve randomly-selected Wikidata items or properties} +\usage{ +get_random_item(limit = 1, ...) + +get_random_property(limit = 1, ...) +} +\arguments{ +\item{limit}{how many random items to return. 1 by default, but can be higher.} + +\item{...}{arguments to pass to httr's GET.} +} +\description{ +\code{get_random_item} and \code{get_random_property} allow you to retrieve the data +associated with randomly-selected Wikidata items and properties, respectively. As with +other \code{WikidataR} code, custom print methods are available; use \code{\link{str}} +to manipulate and see the underlying structure of the data. +} +\examples{ + +#Random item +random_item <- get_random_item() + +#Random property +random_property <- get_random_property() + +} +\seealso{ +\code{\link{get_item}} for selecting a specific item or property, +or \code{\link{find_item}} for using search functionality to pull out +item or property IDs where the descriptions or aliases match a particular +search term. +} diff --git a/man/print.find_item.Rd b/man/print.find_item.Rd new file mode 100644 index 0000000..0bfbccc --- /dev/null +++ b/man/print.find_item.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prints.R +\name{print.find_item} +\alias{print.find_item} +\title{Print method for find_item} +\usage{ +\method{print}{find_item}(x, ...) +} +\arguments{ +\item{x}{find_item object with search results} + +\item{\dots}{Arguments to be passed to methods} +} +\description{ +print found items. +} diff --git a/man/print.find_property.Rd b/man/print.find_property.Rd new file mode 100644 index 0000000..a7f4e4f --- /dev/null +++ b/man/print.find_property.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prints.R +\name{print.find_property} +\alias{print.find_property} +\title{Print method for find_property} +\usage{ +\method{print}{find_property}(x, ...) +} +\arguments{ +\item{x}{find_property object with search results} + +\item{\dots}{Arguments to be passed to methods} +} +\description{ +print found properties. +} diff --git a/man/print.wikidata.Rd b/man/print.wikidata.Rd new file mode 100644 index 0000000..8e3b076 --- /dev/null +++ b/man/print.wikidata.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prints.R +\name{print.wikidata} +\alias{print.wikidata} +\title{Print method for Wikidata objects} +\usage{ +\method{print}{wikidata}(x, ...) +} +\arguments{ +\item{x}{wikidata object from get_item, get_random_item, get_property or get_random_property} + +\item{\dots}{Arguments to be passed to methods} +} +\description{ +print found objects generally. +} +\seealso{ +get_item, get_random_item, get_property or get_random_property +} diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 0000000..34fbd73 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,4 @@ +library(testthat) +library(WikidataR) + +test_check("WikidataR") diff --git a/tests/testthat/test_geo.R b/tests/testthat/test_geo.R new file mode 100644 index 0000000..ac7e88b --- /dev/null +++ b/tests/testthat/test_geo.R @@ -0,0 +1,48 @@ +testthat::context("Geographic queries") + +testthat::test_that("Simple entity-based geo lookups work", { + field_names <- c("item", "name", "latitutde", "longitude", "entity") + sf_locations <- get_geo_entity("Q62") + testthat::expect_true(is.data.frame(sf_locations)) + testthat::expect_true(all(field_names == names(sf_locations))) + testthat::expect_true(unique(sf_locations$entity) == "Q62") +}) + +testthat::test_that("Language-variant entity-based geo lookups work", { + field_names <- c("item", "name", "latitutde", "longitude", "entity") + sf_locations <- get_geo_entity("Q62", language = "fr") + testthat::expect_true(is.data.frame(sf_locations)) + testthat::expect_true(all(field_names == names(sf_locations))) + testthat::expect_true(unique(sf_locations$entity) == "Q62") +}) + +testthat::test_that("Radius restricted entity-based geo lookups work", { + field_names <- c("item", "name", "latitutde", "longitude", "entity") + sf_locations <- get_geo_entity("Q62", radius = 1) + testthat::expect_true(is.data.frame(sf_locations)) + testthat::expect_true(all(field_names == names(sf_locations))) + testthat::expect_true(unique(sf_locations$entity) == "Q62") +}) + +testthat::test_that("multi-entity geo lookups work", { + field_names <- c("item", "name", "latitutde", "longitude", "entity") + sf_locations <- get_geo_entity(c("Q62", "Q64"), radius = 1) + testthat::expect_true(is.data.frame(sf_locations)) + testthat::expect_true(all(field_names == names(sf_locations))) + testthat::expect_equal(length(unique(sf_locations$entity)), 2) +}) + +testthat::test_that("Simple bounding lookups work", { + field_names <- c("item", "name", "latitutde", "longitude") + bruges_box <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest") + testthat::expect_true(is.data.frame(bruges_box)) + testthat::expect_true(all(field_names == names(bruges_box))) +}) + +testthat::test_that("Language-variant bounding lookups work", { + field_names <- c("item", "name", "latitutde", "longitude") + bruges_box <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest", + language = "fr") + testthat::expect_true(is.data.frame(bruges_box)) + testthat::expect_true(all(field_names == names(bruges_box))) +}) \ No newline at end of file diff --git a/tests/testthat/test_gets.R b/tests/testthat/test_gets.R new file mode 100644 index 0000000..2164d4b --- /dev/null +++ b/tests/testthat/test_gets.R @@ -0,0 +1,30 @@ +context("Direct Wikidata get functions") + +test_that("A specific item can be retrieved with an entire item code", { + expect_true({get_item("Q100");TRUE}) +}) + +test_that("A specific item can be retrieved with a partial entire item code", { + expect_true({get_item("100");TRUE}) +}) + +test_that("A specific property can be retrieved with an entire prop code + namespace", { + expect_true({get_property("Property:P10");TRUE}) +}) + +test_that("A specific property can be retrieved with an entire prop code + namespace", { + expect_true({get_property("P10");TRUE}) +}) + + +test_that("A specific property can be retrieved with a partial prop code", { + expect_true({get_property("10");TRUE}) +}) + +test_that("A randomly-selected item can be retrieved",{ + expect_true({get_random_item();TRUE}) +}) + +test_that("A randomly-selected property can be retriveed",{ + expect_true({get_random_property();TRUE}) +}) \ No newline at end of file diff --git a/tests/testthat/test_search.R b/tests/testthat/test_search.R new file mode 100644 index 0000000..3588f82 --- /dev/null +++ b/tests/testthat/test_search.R @@ -0,0 +1,17 @@ +context("Search functions") + +test_that("English-language search works",{ + expect_true({find_item("Wonder Girls", "en");TRUE}) +}) + +test_that("Non-English-language search works",{ + expect_true({find_item("Wonder Girls", "es");TRUE}) +}) + +test_that("Search with limit modding works",{ + expect_that(length(find_item("Wonder Girls", "en", 3)), equals(3)) +}) + +test_that("Property search works",{ + expect_true({find_property("Music", "en");TRUE}) +}) \ No newline at end of file diff --git a/vignettes/Introduction.Rmd b/vignettes/Introduction.Rmd new file mode 100644 index 0000000..e22aae2 --- /dev/null +++ b/vignettes/Introduction.Rmd @@ -0,0 +1,82 @@ +<!-- +%\VignetteEngine{knitr::knitr} +%\VignetteIndexEntry{Introduction to WikidataR} +--> + +# WikidataR: the API client library for Wikidata +Wikidata is a wonderful and irreplaceable resource for linked data, containing information on pretty much any subject. If there's a Wikipedia article on it, there's almost certainly a Wikidata item for it. + +<code>WikidataR</code> - following the naming scheme of [WikipediR](https://github.com/Ironholds/WikipediR#thanks-and-misc) - is an API client library for Wikidata, written in and accessible from R. + +## Items and properties +The two basic component pieces of Wikidata are "items" and "properties". An "item" is a thing - a concept, object or +topic that exists in the real world, such as "Rush". These items each have statements associated with them - for +example, "Rush is an instance of: Rock Band". In that statement, "Rock Band" is a property: a class or trait +that items can hold. Wikidata items are organised as descriptors of the item, in various languages, and references to the properties that that item holds. + +## Retrieving specific items or properties +Items and properties are both identified by numeric IDs, prefaced with "Q" in the case of items, +and "P" in the case of properties. WikipediR can be used to retrieve items or properties with specific +ID numbers, using the <code>get\_item</code> and <code>get\_property</code> functions: + +```{r, eval=FALSE} +#Retrieve an item +item <- get_item(id = 1) + +#Get information about the property of the first claim it has. +first_claim <- get_property(id = names(item$claims)[1]) +#Do we succeed? Dewey! +``` + +These functions are capable of accepting various forms for the ID, including (as examples), "Q100" or "100" +for items, and "Property:P100", "P100" or "100" for properties. They're also vectorised - pass them as many IDs as you want! + +## Retrieving randomly-selected items or properties +As well as retrieving specific items or properties, Wikidata's API also allows for the retrieval of *random* +elements. With WikidataR, this can be achieved through: + +```{r, eval=FALSE} +#Retrieve a random item +rand_item <- get_random_item() + +#Retrieve a random property +rand_prop <- get_random_property() +``` + +These also allow you to retrieve *sets* of random elements - not just one at a time, but say, 50 at a time - by including the "limit" argument: + +```{r, eval=FALSE} +#Retrieve 42 random items +rand_item <- get_random_item(limit = 42) + +#Retrieve 42 random properties +rand_prop <- get_random_property(limit = 42) +``` + +## Search +Wikidata's search functionality can also be used, either to find items or to find properties. All you need is +a search string (which is run over the names and descriptions of items or properties) and a language code +(since Wikidata's descriptions can be in many languages): + +```{r, eval=FALSE} +#Find item - find defaults to "en" as a language. +aarons <- find_item("Aaron Halfaker") + +#Find a property - also defaults to "en" +first_names <- find_property("first name") +``` + +The resulting search entries have the ID as a key, making it trivial to then retrieve the full corresponding +items or properties: + +```{r, eval=FALSE} +#Find item. +all_aarons <- find_item("Aaron Halfaker") + +#Grab the ID code for the first entry and retrieve the associated item data. +first_aaron <- get_item(all_aarons[[1]]$id) +``` + +## Other and future functionality +If you have ideas for other types of useful Wikidata access, the best approach +is to either [request it](https://github.com/Ironholds/WikidataR/issues) or [add it](https://github.com/Ironholds/WikidataR/pulls)! -- Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-wikidatar.git _______________________________________________ debian-med-commit mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
