kou commented on a change in pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#discussion_r468193502
##########
File path: cpp/src/arrow/ipc/options.h
##########
@@ -94,6 +94,11 @@ struct ARROW_EXPORT IpcReadOptions {
/// like decompression
bool use_threads = true;
+ /// \brief Convert endian of data element to platform-native endianness
+ /// if the endianness of the received schema is not equal to
+ /// platform-native endianness
+ bool use_native_endian = true;
Review comment:
Sorry. I suggested the `use_native_endian` name but it's not a good
name...
It looks like that "if use_native_endian is false, the read data uses non
native endian".
How about renaming it to `ensure_native_endian` or something?
##########
File path: cpp/src/arrow/type.h
##########
@@ -1582,13 +1583,26 @@ class ARROW_EXPORT FieldRef {
// ----------------------------------------------------------------------
// Schema
+enum class Endianness {
+ LITTLE = 0,
+ BIG = 1,
+#if ARROW_LITTLE_ENDIAN
+ NATIVE = LITTLE
+#else
+ NATIVE = BIG
+#endif
+};
+
/// \class Schema
/// \brief Sequence of arrow::Field objects describing the columns of a record
/// batch or table data structure
class ARROW_EXPORT Schema : public detail::Fingerprintable,
public util::EqualityComparable<Schema>,
public util::ToStringOstreamable<Schema> {
public:
+ explicit Schema(std::vector<std::shared_ptr<Field>> fields, Endianness
endianness,
+ std::shared_ptr<const KeyValueMetadata> metadata = NULLPTR);
Review comment:
Here are use cases I imagine:
* Router: It doesn't process any Arrow data and just passes the received
Arrow data to another system. In this case, it's efficient that we allow
non-native endianness in memory.
* Saver: It doesn't process any Arrow data and just saves the received
Arrow data to filesystem. For example, it saves the received Arrow data to S3
as is.
Does it make sense?
##########
File path: cpp/src/arrow/ipc/reader.cc
##########
@@ -449,7 +449,7 @@ Result<std::shared_ptr<RecordBatch>> LoadRecordBatchSubset(
const flatbuf::RecordBatch* metadata, const std::shared_ptr<Schema>&
schema,
const std::vector<bool>& inclusion_mask, const DictionaryMemo*
dictionary_memo,
const IpcReadOptions& options, MetadataVersion metadata_version,
- Compression::type compression, io::RandomAccessFile* file) {
+ Compression::type compression, io::RandomAccessFile* file, bool
swap_endian = false) {
Review comment:
I see.
How about adding a new simple internal struct (`struct IpcReadContext`?) for
organizing options (`IpcReadOptions`, `DictionaryMemo` and `swap_endian` for
now)?
I think that many arguments API isn't good to maintain.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]