Ok, thanks for the feedback. I'll look into the suggestions that you offered.

Thanks again,
Tom

On Wed, May 4, 2011 at 1:17 AM, Aki Yoshida <[email protected]> wrote:
> Hi Tom,
> I think the wrong thing about this method is that it adds an extra
> space at the beginning. If the file content is an XML and it starts
> with the xml declaration, there will be an extra space in front of the
> declaration that violates the well-formdness.
>
> You can create a jira issue for this particular bug. But this will not
> really help your in the long run. I will explain the reason below.
>
> As I understand your use case, you want to use this method for reading
> an XML file and creating its java string representation in your
> application.  As I see this method, it doesn't look like it was really
> meant to be used for such purposes. Furthermore, it seems that this
> class is only used in some unit test classes for performing a simple
> content comparison.
>
> For your particular use case, you need to take care of the character
> encoding and possibly the newline handling. This FileUtil's method
> ignores the encoding of the file.  If the file is using the utf-8
> encoding, you need to read the stream and covert it into a java String
> using the utf-8 encoding. If it is in some other encoding like utf-16,
> iso-8859-1, etc, you need to use that encoding for conversion.
> Otherwise, you will have a corrupted String for some characters.
> Regarding the newline handling, this method currently removes all the
> CR/LFs. This is probably okay for the existing test use cases, but for
> your use case, you may want to either preserve the new line characters
> or to normalize them using the standard XML rule. So, there will be
> some other issues you will encounter if you use this simple method.
>
> Therefore, I would recommed you not to use this FileUtil's method and
> instead use an alternative approach using the xml parser to convert a
> file for further processing (e.g., using InputSource to work on the
> Source or XMLUtils.parse() to work on the Document).
>
> Regards, Aki
>
> 2011/5/3 Tom Eastmond <[email protected]>:
>> That would be great to get this fixed - should I create a defect? I'd
>> also love to not have it replace a single space with 2 spaces since
>> that has caught me by surprise in my testing as well. Let me know what
>> you'd like me to do.
>>
>> Thanks again,
>> Tom Eastmond
>>
>> On Tue, May 3, 2011 at 6:19 AM, Aki Yoshida <[email protected]> wrote:
>>> Sorry,
>>> I realized this method has actually nothing to do with XML.
>>> please ignore my comments on XML normalization.
>>> regards, aki
>>>
>>> 2011/5/3 Aki Yoshida <[email protected]>:
>>>> Hi,
>>>> you are right. The normalizeCRLF() method should not add an extra
>>>> space at the begining. We can fix this particular issue.
>>>>
>>>> But there is one open question, as the exact purpose (use case) of
>>>> this method is not clear to me. Why do we need this normalization
>>>> method that just removes all the CRs and LFs and replace each
>>>> space/tab character with a single space and this method is
>>>> automatically called in FileUtils.getStringFromFile()?
>>>>
>>>> Does someone else wants to have other normalization options such as
>>>> doing the standard xml white space "ignore" handling or the
>>>> end-of-line handling (i.e., replacing each CRLF pair to a single LF)?
>>>>
>>>> Regards, aki
>>>>
>>>> 2011/5/2 Tom Eastmond <[email protected]>:
>>>>> I was using the FileUtils.getStringFromFile() method for some Camel
>>>>> testing and was receiving a SAXParseException: The processing
>>>>> instruction target matching "[xX][mM][lL]" is not allowed.].
>>>>>
>>>>> It turns out that this was due to the was due to the
>>>>> FileUtils.normalizeCRLF() method which replaces whitespace characters
>>>>> (\s) with two spaces. This method appends leading spaces to the
>>>>> contents (before the <?xml version="1.0" encoding="UTF-8"?> in this
>>>>> case) which chokes the XML parser. Would it be feasible to forgo the
>>>>> leading spaces at the start of a file in order to avoid this issue?
>>>>> I'd be happy to submit a test case/patch if this seems like a valid
>>>>> bug/fix. Please let me know if I should use another forum for this
>>>>> request.
>>>>>
>>>>> Thanks for the excellent work,
>>>>>
>>>>> Tom Eastmond
>>>>>
>>>>
>>>
>>
>

Reply via email to