Hello XZ developers, I'm one of Intel's participants on loan to the Standard Performance Evaluation Corporation's CPU Subcommittee. For our next SPEC CPU benchmark, xz is a candidate benchmarking component. We're reaching out due to a standards related issue in the xz source cited by Intel and IBM C standard leaders. Our subsequent fix is under consideration from the CPU subcommittee xz owner. The CPU committee would like to present it to you for your consideration and/or notification.
*At concern is the type of struct lzma_coder_s. It's typedef'd in src/common/common.h. struct lzma_coder_s is defined separately and distinctly in multiple modules. *Two of our standards stakeholders see the multiple definitions as being at odds with c99 sections 6.2.7p1,p2, 6.7.5.1p2. *Intel sees runtime errors due to struct lzma_coder_s having multiple definitions between caller and callee in the various modules. Such runtime errors have occurred as segfaults against the intel compiler, as well as with intel's biendian compiler tools. *Comments on C standards issues from our standards leaders are later in the message. *In essence, our change is twofold: 1) typedef lzma_coder as a void. Parent data structures (struct lzma_next_coder_s and struct lzma_internal_s) refer to a void pointer and are in line with c99 6.2.7 as caller and callee will see the same definition: --- ../../liblzma/common/common.h 2013-03-06 12:55:34.000000000 -0800 +++ liblzma/common/common.h 2016-09-22 06:13:33.678979807 -0700 @@ -75,7 +75,7 @@ /// Type of encoder/decoder specific data; the actual structure is defined /// differently in different coders. -typedef struct lzma_coder_s lzma_coder; +typedef void lzma_coder; typedef struct lzma_next_coder_s lzma_next_coder; 2) Refer to the lzma_coder data structure members with an assignment from the void pointer like so: --- ../../liblzma/common/alone_decoder.c 2013-09-19 22:26:26.000000000 -0700 +++ liblzma/common/alone_decoder.c 2016-09-22 06:13:33.668979806 -0700 @@ -50,13 +50,14 @@ static lzma_ret -alone_decode(lzma_coder *coder, +alone_decode(lzma_coder *pcoder, lzma_allocator *allocator lzma_attribute((__unused__)), const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { + struct lzma_coder_s* coder = pcoder; while (*out_pos < out_size && (coder->sequence == SEQ_CODE || *in_pos < in_size)) switch (coder->sequence) { @@ -166,8 +167,9 @@ static void -alone_decoder_end(lzma_coder *coder, lzma_allocator *allocator) +alone_decoder_end(lzma_coder *pcoder, lzma_allocator *allocator) { + struct lzma_coder_s* coder = pcoder; lzma_next_end(&coder->next, allocator); lzma_free(coder, allocator); return; @@ -175,9 +177,10 @@ *Due to development constraints our base source tree is xz 5.0.5. However, the same style fix should apply to the latest stable source. *I'll post our altered sources as well as a diff to 5.0.5 base source tree on our drop box: https://dl.dropboxusercontent.com/u/18496321/xz_c99std_5.0.5base.tar.gz *I've noticed folks on the xz alias posting diffs directly, however our diff comes out to ~1800 lines. Please let us know if it's desired and warranted to add the diff to the xz-devel alias. *Here are comments from Hubert Tong, one of IBM's C standards stakeholders: <verbatim> Aliasing violations in liblzma liblzma/common/stream_decoder.c defines a type struct lzma_coder_s. liblzma/common/block_decoder.c defines a type struct lzma_coder_s. The two types are declared such that their respective sequences of member names differ, thus the types are incompatible (C99 subclause 6.2.7 paragraph 1). Note: re: the types being complete, it is clarified in C11 that being "completed anywhere within their respective translation units" is the relevant property of the types. lzma_next_coder is a typedef for struct lzma_next_coder_s from liblzma/common/common.h. struct lzma_next_coder_s from liblzma/common/common.h has a member coder of type lzma_coder *. lzma_coder is a typedef for struct lzma_coder_s. Since struct lzma_coder_s is incompatible between liblzma/common/stream_decoder.c and liblzma/common/block_decoder.c, the pointer type associated with coder is similarly incompatible (C99 subclause 6.7.5.1 paragraph 2). The incompatibility of the coder members means that their corresponding lzma_next_coder types are in turn incompatible (6.2.7/1). Under the SEQ_BLOCK_HEADER case in stream_decode, the call to lzma_block_decoder_init passes a pointer to a lzma_next_coder (liblzma/common/stream_decoder.c) object which is the block_decoder member of an object of the struct lzma_coder_s (liblzma/common/stream_decoder.c) type. lzma_block_decoder_init, through the pointer, accesses the object (e.g., to assign a pointer to the freshly allocated struct lzma_coder_s) through an lvalue of lzma_coder * (liblzma/common/block_decoder.c) type. This access has undefined behavior under C99 subclause 6.5 paragraph 7 since lzma_coder * (liblzma/common/block_decoder.c) is not sufficiently related to lzma_next_coder (liblzma/common/stream_decoder.c). The usual manifestation of the undefined behavior is that certain reads would not observe "prior" writes, writes occur "out of order", etc. </verbatim> <verbatim> modifying lzma_coder to be a typedef to void solves the specific aliasing violation I identified below by disassociating lzma_next_coder_s from the individual lzma_coder_s types. The strategy chosen had the best chance of resolving, throughout the code, similar issues due to the various lzma_coder_s types. On calls to functions sharing the same lzma_coder_s type, the conversions to and from struct lzma_coder_s * to void * and back by use of an extra variable is unfortunate; however, identifying "non-interface" functions in a reliable manner would take time we did not have. </verbatim> Thank you! -MichaelC Michael Royce Carroll Software Performance Engineer michael dot r dot carroll at intel dot com