Re: [R] Failure to understand namespaces in XML::getNodeSet
Hadley, It’s sometimes amazing the mistakes I can make. No, it did not do what I wanted, which was read_xml(str_c(with_ns_xml, collapse = “") Reproducible example follows: library(stringr) library(xml2) ## Given the correct argument value for collapse, the next two lines work no_ns <- read_xml(str_c(no_ns_xml, collapse = "")) with_ns <- read_xml(str_c(with_ns_xml, collapse = "")) ## The next line finds the node in the XML without a namespace xml_find_all(no_ns, "//WorkSet//Description") ## With a namespace designated in the XML ## Neither of the next two work, though I thought the second should xml_find_all(with_ns, "//WorkSet//Description") xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns)) ## Using xml_ns_strip() works as predicted xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description") ## I was surprised to find the incorrect namespace value did not matter xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns)) ## This also seems to ignore the namespace argument value xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = xml_ns(with_ns)) Full output follows: > ## Given the correct argument value for collapse, the next two lines work > no_ns <- read_xml(str_c(no_ns_xml, collapse = "")) > with_ns <- read_xml(str_c(with_ns_xml, collapse = "")) > ## The next line finds the node in the XML without a namespace > xml_find_all(no_ns, "//WorkSet//Description") {xml_nodeset (1)} [1] MFIA 9-Plex (CharlesRiver) > ## With a namespace designated in the XML > ## Neither of the next two work, though I thought the second should > xml_find_all(with_ns, "//WorkSet//Description") {xml_nodeset (0)} > xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns)) {xml_nodeset (0)} > ## Using xml_ns_strip() works as predicted > xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description") {xml_nodeset (1)} [1] MFIA 9-Plex (CharlesRiver) > ## I was surprised to find the incorrect namespace value did not matter > xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns)) {xml_nodeset (1)} [1] MFIA 9-Plex (CharlesRiver) > ## This also seems to ignore the namespace argument value > xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = > xml_ns(with_ns)) {xml_nodeset (1)} [1] MFIA 9-Plex (CharlesRiver) R. Mark Sharp, Ph.D. msh...@txbiomed.org > On Jan 31, 2017, at 5:52 PM, Hadley Wickhamwrote: > > I think you want > > x <- read_xml(' > http://labkey.org/etl/xml;> > MFIA 9-Plex (CharlesRiver) > ') > > The collapse argument do what you think it does. > > Hadley > > On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp wrote: >> Hadley, >> >> Thank you. I am able to get the xml_ns_strip() function to work with my file >> directly so I will likely be able to reach my immediate goal. >> >> However, I still have had no success with understanding the namespace >> problem. I am not able to use read_xml() using the object I generated for >> the reproducible example, which is simply a character vector of length 4 >> having the contents of the XML file as produce by readLines(). I then used >> dput() to define the structure. The resulting structure apparently is not to >> the liking of read_xml(). I have reproduced the necessary code here for your >> convenience. There error is below. >> >> ## >> library(xml2) >> library(stringr) >> with_ns_xml <- c("", >> "http://labkey.org/etl/xml\;>", >> "MFIA 9-Plex (CharlesRiver)", >> "") >> ## without str_c() collapse it complain of a vector of length > 1 also. >> read_xml(str_c(with_ns_xml, collapse = TRUE)) >> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html >> = as_html, : >> Start tag expected, '<' not found [4] >> >> ## produces the following error message. >> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html >> = as_html, : >> Start tag expected, '<' not found [4] >> >> I have similar issues with xml2::xml_find_all >> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description") >> >> ## Produces the following error message. >> Error in UseMethod("xml_find_all") : >> no applicable method for 'xml_find_all' applied to an object of class >> "character" >> >> >> >> R. Mark Sharp, Ph.D. >> msh...@txbiomed.org >> >> >> >> >> >>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham wrote: >>> >>> See the last example in ?xml2::xml_find_all or use >>> xml2::xml2::xml_ns_strip() >>> >>> Hadley >>> >>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp wrote: I am trying to read a series of XML files that use a namespace and I have failed, thus far, to discover the proper syntax. I have a reproducible example below. I have two XML character strings defined: one without a namespace and one with. I show that I can successfully extract the node using the XML string without the namespace and fail when
Re: [R] Failure to understand namespaces in XML::getNodeSet
I think you want x <- read_xml(' http://labkey.org/etl/xml;> MFIA 9-Plex (CharlesRiver) ') The collapse argument do what you think it does. Hadley On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharpwrote: > Hadley, > > Thank you. I am able to get the xml_ns_strip() function to work with my file > directly so I will likely be able to reach my immediate goal. > > However, I still have had no success with understanding the namespace > problem. I am not able to use read_xml() using the object I generated for the > reproducible example, which is simply a character vector of length 4 having > the contents of the XML file as produce by readLines(). I then used dput() to > define the structure. The resulting structure apparently is not to the liking > of read_xml(). I have reproduced the necessary code here for your > convenience. There error is below. > > ## > library(xml2) > library(stringr) > with_ns_xml <- c("", > "http://labkey.org/etl/xml\;>", > "MFIA 9-Plex (CharlesRiver)", > "") > ## without str_c() collapse it complain of a vector of length > 1 also. > read_xml(str_c(with_ns_xml, collapse = TRUE)) > Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = > as_html, : > Start tag expected, '<' not found [4] > > ## produces the following error message. > Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = > as_html, : > Start tag expected, '<' not found [4] > > I have similar issues with xml2::xml_find_all > xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description") > > ## Produces the following error message. > Error in UseMethod("xml_find_all") : > no applicable method for 'xml_find_all' applied to an object of class > "character" > > > > R. Mark Sharp, Ph.D. > msh...@txbiomed.org > > > > > >> On Jan 31, 2017, at 4:27 PM, Hadley Wickham wrote: >> >> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip() >> >> Hadley >> >> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp wrote: >>> I am trying to read a series of XML files that use a namespace and I have >>> failed, thus far, to discover the proper syntax. I have a reproducible >>> example below. I have two XML character strings defined: one without a >>> namespace and one with. I show that I can successfully extract the node >>> using the XML string without the namespace and fail when using the XML >>> string with the namespace. >>> >>> Mark >>> PS I am having the same problem with the xml2 package and am hoping >>> understanding one with help with the other. >>> >>> ## >>> library(XML) >>> ## The first XML text (no_ns_xml) does not have a namespace defined >>> no_ns_xml <- c("", "", >>> "MFIA 9-Plex (CharlesRiver)", >>> "") >>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE, >>> useInternalNodes = TRUE) >>> ## The node is found >>> getNodeSet(l_no_ns_xml, "/WorkSet//Description") >>> >>> ## The second XML text (with_ns_xml) has a namespace defined >>> with_ns_xml <- c("", >>> "http://labkey.org/etl/xml\;>", >>> "MFIA 9-Plex (CharlesRiver)", >>> "") >>> >>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE, >>> useInternalNodes = TRUE) >>> ## The node is not found >>> getNodeSet(l_with_ns_xml, "/WorkSet//Description") >>> ## I attempt to provide the namespace, but fail. >>> ns <- "http://labkey.org/etl/xml; >>> names(ns)[1] <- "xmlns" >>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns) >>> >>> R. Mark Sharp, Ph.D. >>> Director of Data Science Core >>> Southwest National Primate Research Center >>> Texas Biomedical Research Institute >>> P.O. Box 760549 >>> San Antonio, TX 78245-0549 >>> Telephone: (210)258-9476 >>> e-mail: msh...@txbiomed.org >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> http://hadley.nz > > CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments > transmitted, may contain privileged and confidential information and is > intended solely for the exclusive use of the individual or entity to whom it > is addressed. If you are not the intended recipient, you are hereby notified > that any review, dissemination, distribution or copying of this e-mail and/or > attachments is strictly prohibited. If you have received this e-mail in > error, please immediately notify the sender stating that this transmission >
Re: [R] Failure to understand namespaces in XML::getNodeSet
Hadley, Thank you. I am able to get the xml_ns_strip() function to work with my file directly so I will likely be able to reach my immediate goal. However, I still have had no success with understanding the namespace problem. I am not able to use read_xml() using the object I generated for the reproducible example, which is simply a character vector of length 4 having the contents of the XML file as produce by readLines(). I then used dput() to define the structure. The resulting structure apparently is not to the liking of read_xml(). I have reproduced the necessary code here for your convenience. There error is below. ## library(xml2) library(stringr) with_ns_xml <- c("", "http://labkey.org/etl/xml\;>", "MFIA 9-Plex (CharlesRiver)", "") ## without str_c() collapse it complain of a vector of length > 1 also. read_xml(str_c(with_ns_xml, collapse = TRUE)) Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Start tag expected, '<' not found [4] ## produces the following error message. Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Start tag expected, '<' not found [4] I have similar issues with xml2::xml_find_all xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description") ## Produces the following error message. Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "character" R. Mark Sharp, Ph.D. msh...@txbiomed.org > On Jan 31, 2017, at 4:27 PM, Hadley Wickhamwrote: > > See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip() > > Hadley > > On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp wrote: >> I am trying to read a series of XML files that use a namespace and I have >> failed, thus far, to discover the proper syntax. I have a reproducible >> example below. I have two XML character strings defined: one without a >> namespace and one with. I show that I can successfully extract the node >> using the XML string without the namespace and fail when using the XML >> string with the namespace. >> >> Mark >> PS I am having the same problem with the xml2 package and am hoping >> understanding one with help with the other. >> >> ## >> library(XML) >> ## The first XML text (no_ns_xml) does not have a namespace defined >> no_ns_xml <- c("", "", >> "MFIA 9-Plex (CharlesRiver)", >> "") >> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE, >> useInternalNodes = TRUE) >> ## The node is found >> getNodeSet(l_no_ns_xml, "/WorkSet//Description") >> >> ## The second XML text (with_ns_xml) has a namespace defined >> with_ns_xml <- c("", >> "http://labkey.org/etl/xml\;>", >> "MFIA 9-Plex (CharlesRiver)", >> "") >> >> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE, >> useInternalNodes = TRUE) >> ## The node is not found >> getNodeSet(l_with_ns_xml, "/WorkSet//Description") >> ## I attempt to provide the namespace, but fail. >> ns <- "http://labkey.org/etl/xml; >> names(ns)[1] <- "xmlns" >> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns) >> >> R. Mark Sharp, Ph.D. >> Director of Data Science Core >> Southwest National Primate Research Center >> Texas Biomedical Research Institute >> P.O. Box 760549 >> San Antonio, TX 78245-0549 >> Telephone: (210)258-9476 >> e-mail: msh...@txbiomed.org >> >> >> >> >> >> >> >> >> >> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > http://hadley.nz CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to understand namespaces in XML::getNodeSet
See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip() Hadley On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharpwrote: > I am trying to read a series of XML files that use a namespace and I have > failed, thus far, to discover the proper syntax. I have a reproducible > example below. I have two XML character strings defined: one without a > namespace and one with. I show that I can successfully extract the node using > the XML string without the namespace and fail when using the XML string with > the namespace. > > Mark > PS I am having the same problem with the xml2 package and am hoping > understanding one with help with the other. > > ## > library(XML) > ## The first XML text (no_ns_xml) does not have a namespace defined > no_ns_xml <- c("", "", >"MFIA 9-Plex (CharlesRiver)", >"") > l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE, >useInternalNodes = TRUE) > ## The node is found > getNodeSet(l_no_ns_xml, "/WorkSet//Description") > > ## The second XML text (with_ns_xml) has a namespace defined > with_ns_xml <- c("", > "http://labkey.org/etl/xml\;>", > "MFIA 9-Plex (CharlesRiver)", > "") > > l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE, >useInternalNodes = TRUE) > ## The node is not found > getNodeSet(l_with_ns_xml, "/WorkSet//Description") > ## I attempt to provide the namespace, but fail. > ns <- "http://labkey.org/etl/xml; > names(ns)[1] <- "xmlns" > getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns) > > R. Mark Sharp, Ph.D. > Director of Data Science Core > Southwest National Primate Research Center > Texas Biomedical Research Institute > P.O. Box 760549 > San Antonio, TX 78245-0549 > Telephone: (210)258-9476 > e-mail: msh...@txbiomed.org > > > > > > > > > > CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://hadley.nz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Failure to understand namespaces in XML::getNodeSet
I am trying to read a series of XML files that use a namespace and I have failed, thus far, to discover the proper syntax. I have a reproducible example below. I have two XML character strings defined: one without a namespace and one with. I show that I can successfully extract the node using the XML string without the namespace and fail when using the XML string with the namespace. Mark PS I am having the same problem with the xml2 package and am hoping understanding one with help with the other. ## library(XML) ## The first XML text (no_ns_xml) does not have a namespace defined no_ns_xml <- c("", "", "MFIA 9-Plex (CharlesRiver)", "") l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE, useInternalNodes = TRUE) ## The node is found getNodeSet(l_no_ns_xml, "/WorkSet//Description") ## The second XML text (with_ns_xml) has a namespace defined with_ns_xml <- c("", "http://labkey.org/etl/xml\;>", "MFIA 9-Plex (CharlesRiver)", "") l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE, useInternalNodes = TRUE) ## The node is not found getNodeSet(l_with_ns_xml, "/WorkSet//Description") ## I attempt to provide the namespace, but fail. ns <- "http://labkey.org/etl/xml; names(ns)[1] <- "xmlns" getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns) R. Mark Sharp, Ph.D. Director of Data Science Core Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msh...@txbiomed.org CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.