lidavidm commented on code in PR #41823: URL: https://github.com/apache/arrow/pull/41823#discussion_r1636055253
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -283,6 +283,77 @@ UUID A specific UUID version is not required or guaranteed. This extension represents UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way. +Other +===== + +Other represents a type or array that one Arrow-based system received from an +external (likely non-Arrow) system, but cannot interpret itself. In this +case, the Other type explicitly communicates the name and presence of a field +to downstream clients. + +For example: + +* A Flight SQL service may support connecting external databases. In this + case, its catalog (``GetTables`` etc.) should reflect the names and types of + tables in external databases. But those external systems may support types + it does not recognize. Instead of erroring or silently dropping columns + from the catalog, it can use the Other[Null] type to report that a column + exists with a particular name and type name in the external database; this + lets clients know that a column exists, but is not supported. + +* The ADBC PostgreSQL driver, because of how the PostgreSQL wire protocol + works, may get bytes for a field whose type it does not recognize (say, a + geospatial type). It can still return the bytes to the application which + may be able to parse the data itself. In that case, it can use the + Other[binary] type to return the column data. The Other type differentiates + the column from actual binary columns. + +Of course, the intermediate system *could* implement a custom extension type +for these example types. But there is no way in general that every type can +be known in advance. In such cases, the Other type allows the system to +explicitly note that it does not support some type or field, without silently +losing data or sending irrelevant errors. It could also pretend to support +the types by making up extension types on the fly. But this misleads +downstream systems who cannot tell if the type is supported or not. + +Extension parameters: + +* Extension name: ``arrow.other``. + +* The storage type of this extension is any type. If there is no underlying + data, the storage type should be Null. If there is data (because the system + got bytes or some other data it does not know how to interpret), the storage + type should preferably be binary or fixed-size binary, but may be any type. Review Comment: I believe that's already addressed? > I don't want to get hung up on DuckDB specifically (they might not be interested or might be able to implement the ability for runtime-loadable extensions to customize their Arrow export representation before they get a chance to implement this), I just wanted to demonstrate an example where the arbitrary payload that an ADBC driver (or general Arrow consumer) receives is not binary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org