[ 
https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985916#comment-16985916
 ] 

Ryan Skraba commented on AVRO-2644:
-----------------------------------

{quote} Would a PR be accepted that enforces LANG=C semantics or would that 
have to be shipped as a breaking change?
{quote}
My opinion is that it would be a welcome and non-breaking improvement.  The 
Java listFiles() eventually ends up at a 
[readdir|http://man7.org/linux/man-pages/man3/readdir.3.html]: "The order in 
which filenames are read by successive calls to readdir() depends on the 
filesystem implementation; it is unlikely that the names will be sorted in any 
fashion."

It is _possible_ that a use of avro-tools is currently working and would break 
after sorting. Your case demonstrates it currently depends on good luck and the 
whim of readdir.  Sorting would ensure that the same avro-tools command line 
would work/break regardless of the filesystem.

I would estimate that the majority of scripted use of the Java schema compiler 
is through the maven plugin, which would be unaffected.

The default sort of Java Strings is independent of locale (close enough if not 
identical to LANG=C), so that sounds good to me.

The *current workaround* is to explicitly specify the order of the files to be 
compiled on the command line, which should continue to work after sorting.

> Non-Deterministic avsc Directory Compilation
> --------------------------------------------
>
>                 Key: AVRO-2644
>                 URL: https://issues.apache.org/jira/browse/AVRO-2644
>             Project: Apache Avro
>          Issue Type: Bug
>            Reporter: Austin Cawley-Edwards
>            Priority: Minor
>
> {color:#222222}We're trying to use the `compile \{src dir} \{output dir}` 
> command in{color}
>  {color:#222222}`avro-tools` and finding that there are some 
> non-deterministic{color}
>  {color:#222222}behaviors between systems, depending on how the OS sorts 
> files.{color}
> {color:#222222}Example:{color}
>  {color:#222222}schemas/Component.avsc{color} 
>  {color:#222222}  - defines Component record type in the namespace 
> `com.test`{color}
> {color:#222222}schemas/Parent.avsc{color}
>  {color:#222222}  - defines a Parent record,  in the same `com.test` 
> namespace, with a{color}
>  {color:#222222}field of type `com.test.Component`{color}
> {color:#222222}With the command, `java -jar avro-tools-1.9.1.jar compile 
> schemas/{color}
>  {color:#222222}out-dir/`, some systems compile the directory in the order 
> Component,{color}
>  {color:#222222}Parent while others compile in the order Parent, Component. 
> The latter{color}
>  {color:#222222}fails as Component has not been defined when it is referenced 
> by{color}
>  {color:#222222}Parent.{color}
> {color:#222222}We have also tried using the IDL and importing the dependency 
> types,{color}
>  {color:#222222}and then converting them to avsc, and finally compiling the 
> entire{color}
>  {color:#222222}directory, but that fails as the generated avsc files embed/ 
> duplicate{color}
>  {color:#222222}the "Component" types each time it is used.{color}
> {color:#222222}OS:{color}
>  {color:#222222}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 
> 12{color}
>  {color:#222222}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color}
> {color:#222222}Avro:{color}
>  {color:#222222}version 1.9.1{color} 
>  
>  
> Would a PR be accepted that enforces LANG=C semantics or would that have to 
> be shipped as a breaking change?
>  
> Coming from this thread in the mailing list: 
> [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to