[
https://issues.apache.org/jira/browse/LUCENE-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616300#comment-14616300
]
Ramkumar Aiyengar commented on LUCENE-6563:
-------------------------------------------
My point is that the layer which stops caring about the sequence of bytes being
a string is different in OSs. In MacOSX, right up to the FS understands it as
UTF-8. In Linux, its at the application layer.
Actually in this case, my terminal emulator was still in UTF-8, so it accepted
Chinese chars. Which then got passed to touch as a stream of utf-8 bytes (a
simple char* to void main()), and then touch, regardless of the locale, just
created a filename with those bytes as input. Similarly ls just read these
bytes from the directory, and on display alone tried to interpret it using the
locale. In both these cases, the interface used with the OS passed in/out bytes
and nothing more. Java on the other hand uses Strings with an implicit encoding
for these inputs and hence is forced to interpret these bytes even if they are
not being displayed or input.
> MockFileSystemTestCase.testURI should be improved to handle cases where
> OS/JVM cannot create non-ASCII filenames
> ----------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-6563
> URL: https://issues.apache.org/jira/browse/LUCENE-6563
> Project: Lucene - Core
> Issue Type: Wish
> Reporter: Christine Poerschke
> Assignee: Dawid Weiss
> Priority: Minor
>
> {{ant test -Dtestcase=TestVerboseFS -Dtests.method=testURI
> -Dtests.file.encoding=UTF-8}} fails (for example) with 'Oracle Corporation
> 1.8.0_45 (64-bit)' when the default {{sun.jnu.encoding}} system property is
> (for example) {{ANSI_X3.4-1968}}
> [details to follow]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]