hi folks, We've come a long way in our integration testing in the last 6 months. For those who haven't followed actively, the integration tests verify wire compatibility between Java and C++ streaming and file binary formats. We run these tests continuously in Travis CI, which makes it safe for developers to refactor and improve the IPC / wire format code without breaking cross-implementation compatibility. I expect in the near future we will also see some JavaScript integration tests.
The integration test suite works as follows: * Test data is specified in an Arrow-specific JSON format * Each application reads the JSON and converts it to the Arrow binary format according to its implementation * The other application reads the binary and compares the binary to the JSON "point of truth" file What is still missing from the current test suite: - Dictionary tests - Fixed size list - Decimal tests (first available in C++ in 0.3.0) - Unions - Intervals Otherwise we have pretty good coverage of the rest of the data types in https://github.com/apache/arrow/blob/master/format/Schema.fbs; a lot of work in Arrow 0.3.0 was reconciling the date and time types. For the next release 0.5.0 the most important thing to test is dictionaries. It would be nice to test decimals also, but we have some specification work to do as the in-memory format for Decimals in C++ and Java is currently different. I should have time to implement the C++ side of the dictionary integration tests (i.e. converting to/from the JSON format), but someone will need to implement the Java side. It would be great if we could get this done in the next few weeks if anyone has the time to do it. Thanks Wes