pitrou commented on code in PR #37935: URL: https://github.com/apache/arrow/pull/37935#discussion_r1340228329
########## docs/source/format/Integration.rst: ########## @@ -20,32 +20,97 @@ Integration Testing =================== +To ensure Arrow implementations are interoperable between each other, +the Arrow project includes cross-language integration tests which are +regularly run as Continuous Integration tasks. + +The integration tests exercise compliance with several Arrow specifications: +the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol, +and the :ref:`C Data Interface <c-data-interface>`. + +Strategy +-------- + Our strategy for integration testing between Arrow implementations is: * Test datasets are specified in a custom human-readable, JSON-based format - designed exclusively for Arrow's integration tests -* Each implementation provides a testing executable capable of converting - between the JSON and the binary Arrow file representation -* Each testing executable is used to generate binary Arrow file representations - from the JSON-based test datasets. These results are then used to call the - testing executable of each other implementation to validate the contents - against the corresponding JSON file. - - *ie.* the C++ testing executable generates binary arrow files from JSON - specified datasets. The resulting files are then used as input to the Java - testing executable for validation, confirming that the Java implementation - can correctly read what the C++ implementation wrote. + designed exclusively for Arrow's integration tests. + +* Each implementation provides entry points capable of converting + between the JSON and the Arrow in-memory representation, and of exposing + Arrow in-memory data using the desired format. + +* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for + all supported pairs of (producer, consumer) implementations. The producer + typically reads a JSON file, converts it to in-memory Arrow data, and exposes + this data using the format under test. The consumer reads the data in the + said format and converts it back to Arrow in-memory data; it also reads + the same JSON file as the producer, and validates that both datasets are + identical. + +* Each (producer, consumer) pair is tested over a range of JSON files + representing different data type categories, such as numerics, lists, etc. + This makes it easier to pinpoint incompatibilities than if all data types + were represented in a single file. + +Example: IPC format +~~~~~~~~~~~~~~~~~~~ + +Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer +of the Arrow IPC format. Testing a JSON file would go as follows: + +#. A C++ executable reads the JSON file, converts it into Arrow in-memory data + and writes a Arrow IPC file (the file paths are typically given on the command Review Comment: Hmm, is your suggestion a typo? Did you mean "an"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
