Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
Hadley,

It’s sometimes amazing the mistakes I can make. No, it did not do what I 
wanted, which was
read_xml(str_c(with_ns_xml, collapse = “")

Reproducible example follows:
library(stringr)
library(xml2)
## Given the correct argument value for collapse, the next two lines work
no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
## The next line finds the node in the XML without a namespace
xml_find_all(no_ns, "//WorkSet//Description")
## With a namespace designated in the XML
## Neither of the next two work, though I thought the second should
xml_find_all(with_ns, "//WorkSet//Description")
xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
## Using xml_ns_strip() works as predicted
xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
## I was surprised to find the incorrect namespace value did not matter
xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
## This also seems to ignore the namespace argument value
xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
xml_ns(with_ns))


Full output follows:
> ## Given the correct argument value for collapse, the next two lines work
> no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
> with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
> ## The next line finds the node in the XML without a namespace
> xml_find_all(no_ns, "//WorkSet//Description")
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## With a namespace designated in the XML
> ## Neither of the next two work, though I thought the second should
> xml_find_all(with_ns, "//WorkSet//Description")
{xml_nodeset (0)}
> xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (0)}
> ## Using xml_ns_strip() works as predicted
> xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## I was surprised to find the incorrect namespace value did not matter
> xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## This also seems to ignore the namespace argument value
> xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
> xml_ns(with_ns))
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Jan 31, 2017, at 5:52 PM, Hadley Wickham  wrote:
>
> I think you want
>
> x <- read_xml('
>  http://labkey.org/etl/xml;>
>  MFIA 9-Plex (CharlesRiver)
> ')
>
> The collapse argument do what you think it does.
>
> Hadley
>
> On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp  wrote:
>> Hadley,
>>
>> Thank you. I am able to get the xml_ns_strip() function to work with my file 
>> directly so I will likely be able to reach my immediate goal.
>>
>> However, I still have had no success with understanding the namespace 
>> problem. I am not able to use read_xml() using the object I generated for 
>> the reproducible example, which is simply a character vector of length 4 
>> having the contents of the XML file as produce by readLines(). I then used 
>> dput() to define the structure. The resulting structure apparently is not to 
>> the liking of read_xml(). I have reproduced the necessary code here for your 
>> convenience. There error is below.
>>
>> ##
>> library(xml2)
>> library(stringr)
>> with_ns_xml <- c("",
>> "http://labkey.org/etl/xml\;>",
>> "MFIA 9-Plex (CharlesRiver)",
>> "")
>> ## without str_c() collapse it complain of a vector of length > 1 also.
>> read_xml(str_c(with_ns_xml, collapse = TRUE))
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> ## produces the following error message.
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> I have similar issues with xml2::xml_find_all
>> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
>>
>> ## Produces the following error message.
>> Error in UseMethod("xml_find_all") :
>>  no applicable method for 'xml_find_all' applied to an object of class 
>> "character"
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>>
>>
>>
>>
>>
>>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>>>
>>> See the last example in ?xml2::xml_find_all or use 
>>> xml2::xml2::xml_ns_strip()
>>>
>>> Hadley
>>>
>>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
 I am trying to read a series of XML files that use a namespace and I have 
 failed, thus far, to discover the proper syntax. I have a reproducible 
 example below. I have two XML character strings defined: one without a 
 namespace and one with. I show that I can successfully extract the node 
 using the XML string without the namespace and fail when 

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
I think you want

x <- read_xml('
  http://labkey.org/etl/xml;>
  MFIA 9-Plex (CharlesRiver)
')

The collapse argument do what you think it does.

Hadley

On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp  wrote:
> Hadley,
>
> Thank you. I am able to get the xml_ns_strip() function to work with my file 
> directly so I will likely be able to reach my immediate goal.
>
> However, I still have had no success with understanding the namespace 
> problem. I am not able to use read_xml() using the object I generated for the 
> reproducible example, which is simply a character vector of length 4 having 
> the contents of the XML file as produce by readLines(). I then used dput() to 
> define the structure. The resulting structure apparently is not to the liking 
> of read_xml(). I have reproduced the necessary code here for your 
> convenience. There error is below.
>
> ##
> library(xml2)
> library(stringr)
> with_ns_xml <- c("",
>  "http://labkey.org/etl/xml\;>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
> ## without str_c() collapse it complain of a vector of length > 1 also.
> read_xml(str_c(with_ns_xml, collapse = TRUE))
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
>
> ## produces the following error message.
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
>
> I have similar issues with xml2::xml_find_all
> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
>
> ## Produces the following error message.
> Error in UseMethod("xml_find_all") :
>   no applicable method for 'xml_find_all' applied to an object of class 
> "character"
>
>
>
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>>
>> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>>
>> Hadley
>>
>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
>>> I am trying to read a series of XML files that use a namespace and I have 
>>> failed, thus far, to discover the proper syntax. I have a reproducible 
>>> example below. I have two XML character strings defined: one without a 
>>> namespace and one with. I show that I can successfully extract the node 
>>> using the XML string without the namespace and fail when using the XML 
>>> string with the namespace.
>>>
>>> Mark
>>> PS I am having the same problem with the xml2 package and am hoping 
>>> understanding one with help with the other.
>>>
>>> ##
>>> library(XML)
>>> ## The first XML text (no_ns_xml) does not have a namespace defined
>>> no_ns_xml <- c("", "",
>>>   "MFIA 9-Plex (CharlesRiver)",
>>>   "")
>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is found
>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>>
>>> ## The second XML text (with_ns_xml) has a namespace defined
>>> with_ns_xml <- c("",
>>> "http://labkey.org/etl/xml\;>",
>>> "MFIA 9-Plex (CharlesRiver)",
>>> "")
>>>
>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is not found
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>>> ## I attempt to provide the namespace, but fail.
>>> ns <-  "http://labkey.org/etl/xml;
>>> names(ns)[1] <- "xmlns"
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>>
>>> R. Mark Sharp, Ph.D.
>>> Director of Data Science Core
>>> Southwest National Primate Research Center
>>> Texas Biomedical Research Institute
>>> P.O. Box 760549
>>> San Antonio, TX 78245-0549
>>> Telephone: (210)258-9476
>>> e-mail: msh...@txbiomed.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> http://hadley.nz
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments 
> transmitted, may contain privileged and confidential information and is 
> intended solely for the exclusive use of the individual or entity to whom it 
> is addressed. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or copying of this e-mail and/or 
> attachments is strictly prohibited. If you have received this e-mail in 
> error, please immediately notify the sender stating that this transmission 
> 

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
Hadley,

Thank you. I am able to get the xml_ns_strip() function to work with my file 
directly so I will likely be able to reach my immediate goal.

However, I still have had no success with understanding the namespace problem. 
I am not able to use read_xml() using the object I generated for the 
reproducible example, which is simply a character vector of length 4 having the 
contents of the XML file as produce by readLines(). I then used dput() to 
define the structure. The resulting structure apparently is not to the liking 
of read_xml(). I have reproduced the necessary code here for your convenience. 
There error is below.

##
library(xml2)
library(stringr)
with_ns_xml <- c("",
 "http://labkey.org/etl/xml\;>",
 "MFIA 9-Plex (CharlesRiver)",
 "")
## without str_c() collapse it complain of a vector of length > 1 also.
read_xml(str_c(with_ns_xml, collapse = TRUE))
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
as_html,  :
  Start tag expected, '<' not found [4]

## produces the following error message.
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
as_html,  :
  Start tag expected, '<' not found [4]

I have similar issues with xml2::xml_find_all
xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")

## Produces the following error message.
Error in UseMethod("xml_find_all") :
  no applicable method for 'xml_find_all' applied to an object of class 
"character"



R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>
> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>
> Hadley
>
> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
>> I am trying to read a series of XML files that use a namespace and I have 
>> failed, thus far, to discover the proper syntax. I have a reproducible 
>> example below. I have two XML character strings defined: one without a 
>> namespace and one with. I show that I can successfully extract the node 
>> using the XML string without the namespace and fail when using the XML 
>> string with the namespace.
>>
>> Mark
>> PS I am having the same problem with the xml2 package and am hoping 
>> understanding one with help with the other.
>>
>> ##
>> library(XML)
>> ## The first XML text (no_ns_xml) does not have a namespace defined
>> no_ns_xml <- c("", "",
>>   "MFIA 9-Plex (CharlesRiver)",
>>   "")
>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>   useInternalNodes = TRUE)
>> ## The node is found
>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>
>> ## The second XML text (with_ns_xml) has a namespace defined
>> with_ns_xml <- c("",
>> "http://labkey.org/etl/xml\;>",
>> "MFIA 9-Plex (CharlesRiver)",
>> "")
>>
>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>   useInternalNodes = TRUE)
>> ## The node is not found
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>> ## I attempt to provide the namespace, but fail.
>> ns <-  "http://labkey.org/etl/xml;
>> names(ns)[1] <- "xmlns"
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>
>> R. Mark Sharp, Ph.D.
>> Director of Data Science Core
>> Southwest National Primate Research Center
>> Texas Biomedical Research Institute
>> P.O. Box 760549
>> San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: msh...@txbiomed.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://hadley.nz

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()

Hadley

On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
> I am trying to read a series of XML files that use a namespace and I have 
> failed, thus far, to discover the proper syntax. I have a reproducible 
> example below. I have two XML character strings defined: one without a 
> namespace and one with. I show that I can successfully extract the node using 
> the XML string without the namespace and fail when using the XML string with 
> the namespace.
>
> Mark
> PS I am having the same problem with the xml2 package and am hoping 
> understanding one with help with the other.
>
> ##
> library(XML)
> ## The first XML text (no_ns_xml) does not have a namespace defined
> no_ns_xml <- c("", "",
>"MFIA 9-Plex (CharlesRiver)",
>"")
> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is found
> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>
> ## The second XML text (with_ns_xml) has a namespace defined
> with_ns_xml <- c("",
>  "http://labkey.org/etl/xml\;>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
>
> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is not found
> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
> ## I attempt to provide the namespace, but fail.
> ns <-  "http://labkey.org/etl/xml;
> names(ns)[1] <- "xmlns"
> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>
> R. Mark Sharp, Ph.D.
> Director of Data Science Core
> Southwest National Primate Research Center
> Texas Biomedical Research Institute
> P.O. Box 760549
> San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: msh...@txbiomed.org
>
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
I am trying to read a series of XML files that use a namespace and I have 
failed, thus far, to discover the proper syntax. I have a reproducible example 
below. I have two XML character strings defined: one without a namespace and 
one with. I show that I can successfully extract the node using the XML string 
without the namespace and fail when using the XML string with the namespace.

Mark
PS I am having the same problem with the xml2 package and am hoping 
understanding one with help with the other.

##
library(XML)
## The first XML text (no_ns_xml) does not have a namespace defined
no_ns_xml <- c("", "",
   "MFIA 9-Plex (CharlesRiver)",
   "")
l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
   useInternalNodes = TRUE)
## The node is found
getNodeSet(l_no_ns_xml, "/WorkSet//Description")

## The second XML text (with_ns_xml) has a namespace defined
with_ns_xml <- c("",
 "http://labkey.org/etl/xml\;>",
 "MFIA 9-Plex (CharlesRiver)",
 "")

l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
   useInternalNodes = TRUE)
## The node is not found
getNodeSet(l_with_ns_xml, "/WorkSet//Description")
## I attempt to provide the namespace, but fail.
ns <-  "http://labkey.org/etl/xml;
names(ns)[1] <- "xmlns"
getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)

R. Mark Sharp, Ph.D.
Director of Data Science Core
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org









CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.