On 2020/10/05 1:57, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:

>> On 2020/09/26 19:12, Daniel Shahaf wrote:
>>>      1      % svn propset svn:ignore "予定表.txt" ./ 
>>>      2      property 'svn:ignore' set on '.'
>>>      3      % svn propset foo:ignore "予定表.txt" ./ 
>>>      4      property 'foo:ignore' set on '.'
>>>      5      % LC_ALL=ja_JP.eucjp svn pl -v
>>>      6      Properties on '.':
>>>      7        foo:ignore
>>>      8          予定表.txt
>>>      9        svn:ignore
>>>     10          ͽɽ.txt
>>>
>>>     11      % LC_ALL=C svn pg --strict svn:ignore
>>>     12      {U+4E88}{U+5B9A}{U+8868}.txt
>>>
>>>     13      % svn propset svn:ignore "{U+4E88}.txt" ./ 
>>>     14      property 'svn:ignore' set on '.'
>>>     15      % sqlite3 .svn/wc.db .dump | me
>>>     16      (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
>>>     17      % svn pg --strict svn:ignore 
>>>     18      {U+4E88}{U+5B9A}{U+8868}.txt
>>> .
>>> So, I think there are a number of different issues/gotchas here:
>>>
>>> - It's not possible to get the raw value of an svn:* property in
>>>   a working copy if the value is not representable in the local encoding.  
>>
>> I belive that if we want to get property values precisely, we should
>> use xml output, although --no-newline is enough in most case except
>> this case.
> 
> Hmm, that's an interesting one.  On the one hand, «propget --xml»
> does resolve the ambiguity issue of the ad-hoc escaping; on the other
> hand:
> 
> - We shouldn't require CLI users to use an XML parser in order to
>   retrieve values of binary blobs.

Then do we need a new output format for "strict" values?

> - The XML document declares itself to be in UTF-8.  Does that mean XML
>   parsers are allowed to treat the dumped property values as UTF-8 and,
>   for example, convert the byte sequence (that comprises the value) to
>   another byte sequence, that's equivalent when treated as UTF-8 but
>   not equivalent when treated as binary blobs?  (For example, convert
>   the UTF-8 to composed or decomposed normal form.)

At least we expect there is no conversion of byte sequence on parsing,
if the value is considered to be safe by svn_xml_is_xml_safe(). If it
is not so, I think outputs of --xml is broken.

Moreover, as properties have no meta data about its contents, we can't
determine a property is a text or not even if it contains only printable
characters, like 'eicar.com'[1]. So it is not so curious even if we might
use base64 encoding for all properties (but I don't think it is good
idea).

[1] https://svn.haxx.se/dev/archive-2016-03/0043.shtml
(Yes, I was also trapped by it yesterday.) 

Cheers, 
-- 
Yasuhito FUTATSUKI <futat...@yf.bsclub.org>

Reply via email to