emilk opened a new issue, #8351:
URL: https://github.com/apache/arrow-rs/issues/8351

   The `Display` implementation for `DataType` is pretty bad for some things, 
including:
   
   * FixedSizeBinary
   * Timestamp
   * Struct (non-reversible / lossy)
   * Union
   * Dictionary
   * Decimal*
   * Map
   * RunEndEncoded
   
   We need to have a good overall design for these, and then implement it.
   
   ## Design considerations
   ### Readable
   The output should be short and readable for common cases, e.g. 
`List<nullable u8>`
   
   ### Reversable
   We should be able to `parse` back the original `DataType`.
   
   Open question: Should that also include meta-data on any embedded `Field`:s?
   
   ### Safe
   Strings like field names need to be escaped to avoid string injection bugs 
(e.g. strings containing commas, quotes, newlines, …). We could consider 
omitting the quotes for the common case of "safe" strings 
(`[_a-zA-Z][+a-zA-Z0-0]*`).
   
   ### Consistent
   We currently use parentheses for complex datatypes, e.g. `List(Uint8)` and 
`Struct("field": Uint8)`.
   We can switch that for `[]`, `{}`, or `<>`, but I believe we should use the 
same thing for every type, i.e. NOT mix `List<u8>` and `Struct { … }`
   
   ### Familiar
   We currently use the long names `Uint8`, which is familiar to users of other 
Arrow libraries (e.g. py-arrow),
   but we could consider using the shorter `u8`, which is more familiar to Rust 
users.
   
   
   ## Related issues
   * https://github.com/apache/arrow-rs/issues/7048


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to