[ 
https://issues.apache.org/jira/browse/AVRO-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Stockdale updated AVRO-1859:
---------------------------------
    Description: 
The (potential) memory issue is illustrated by the following code, which is a 
minimal set of steps to reproduce an issue found on a much larger real 
application in which avro buffers are read from a third party producer using 
pre-configured reader and writer schemas, which are not attached to each 
individual avro buffer.

The cause of the original problem identified was a down to a change in one of 
these schemas, but the sample code below shows the same effect and illustrates 
that without additional validation a crafted input can cause unwanted behaviour 
in the avro c library.

It may be that here there are additional required validation steps missing  - 
in which case are there any examples of how validation should be performed by 
the application?

Essentially a particular avro input buffer causes the library to attempt to 
allocate memory of  18446744073709551577 bytes in realloc.

This was caught in testing of the real application because the jemalloc memory 
allocator's xmalloc setting was in use, which causes an abort on any failed 
memory allocation.

{code}
#include "avro.h"

const char* schemaBuffer = "\
\
{\"namespace\": \"example.avro\",\
 \"type\":\"record\",\
 \"name\":\"example\",\
 \"fields\": [\
    {\"name\": \"s1\", \"type\": \"string\"}\
 ]\
}";

int main(int argc, char* argv[])
{
  char buffer[4];
  memset(buffer,0 , sizeof(buffer));

  buffer[0] = 0x4f;

  avro_reader_t reader = avro_reader_memory(buffer, 1);
  avro_schema_t schema;

  if(!avro_schema_from_json_length(schemaBuffer, strlen(schemaBuffer), &schema) 
&& reader)
  {
    avro_value_iface_t* iface = avro_generic_class_from_schema(schema);

    if(iface)
    {
      avro_value_t row;

      avro_generic_value_new(iface, &row);

      // the following attempts to allocate memory for a string of size -39 
bytes represented as int64_t in:
      //      static int read_string(avro_reader_t reader, char **s, int64_t 
*len)

      // -39 is cast a to size_t of size 18446744073709551577 which is passed 
to the size to realloc in:
      //      avro_default_allocator(void *ud, void *ptr, size_t osize, size_t 
nsize)

      // an attempt is then made to allocate memory of 18446744073709551577 
bytes

      avro_value_read(reader, &row);
    }
  }
}

{code}
Should there by additional validation steps in application code to ensure the 
buffer is valid for the schema before reading it?



  was:
The (potential) memory issue is illustrated by the following code, which is a 
minimal set of steps to reproduce an issue found on a much larger real 
application in which avro buffers are read from a third party producer using 
pre-configured reader and writer schemas, which are not attached to each 
individual avro buffer.

The cause of the original problem identified was a down to a change in one of 
these schemas, but the sample code below shows the same effect and illustrates 
that without additional validation a crafted input can cause unwanted behaviour 
in the avro c library.

It may be that here there are additional required validation steps missing  - 
in which case are there any examples of how validation should be performed by 
the application?

Essentially a particular avro input buffer causes the library to attempt to 
allocate memory of  18446744073709551577 bytes in realloc.

This was caught in testing of the real application because the jemalloc memory 
allocator's xmalloc setting was in use, which causes an abort on any failed 
memory allocation.

{code}
#include "avro.h"

const char* schemaBuffer = "\
\
{\"namespace\": \"example.avro\",\
 \"type\":\"record\",\
 \"name\":\"example\",\
 \"fields\": [\
          {\"name\": \"s1\", \"type\": \"string\"}\
 ]\
}";

int main(int argc, char* argv[])
{
  char buffer[4];
  memset(buffer,0 , sizeof(buffer));

  buffer[0] = 0x4f;

  avro_reader_t reader = avro_reader_memory(buffer, 1);
  avro_schema_t schema;

  if(!avro_schema_from_json_length(schemaBuffer, strlen(schemaBuffer), &schema) 
&& reader)
  {
    avro_value_iface_t* iface = avro_generic_class_from_schema(schema);

    if(iface)
    {
      avro_value_t row;

      avro_generic_value_new(iface, &row);

      // the following attempts to allocate memory for a string of size -39 
bytes represented as int64_t in:
      //           static int read_string(avro_reader_t reader, char **s, 
int64_t *len)

      // -39 is cast a to size_t of size 18446744073709551577 which is passed 
to the size to realloc in:
      //    avro_default_allocator(void *ud, void *ptr, size_t osize, size_t 
nsize)

      // an attempt is then made to allocate memory of 18446744073709551577 
bytes

      avro_value_read(reader, &row);
    }
  }
}

{code}
Should there by additional validation steps in application code to ensure the 
buffer is valid for the schema before reading it?




> Potential Invalid memory allocation
> -----------------------------------
>
>                 Key: AVRO-1859
>                 URL: https://issues.apache.org/jira/browse/AVRO-1859
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.8.1
>         Environment: linux 64 bit
>            Reporter: Jack Stockdale
>
> The (potential) memory issue is illustrated by the following code, which is a 
> minimal set of steps to reproduce an issue found on a much larger real 
> application in which avro buffers are read from a third party producer using 
> pre-configured reader and writer schemas, which are not attached to each 
> individual avro buffer.
> The cause of the original problem identified was a down to a change in one of 
> these schemas, but the sample code below shows the same effect and 
> illustrates that without additional validation a crafted input can cause 
> unwanted behaviour in the avro c library.
> It may be that here there are additional required validation steps missing  - 
> in which case are there any examples of how validation should be performed by 
> the application?
> Essentially a particular avro input buffer causes the library to attempt to 
> allocate memory of  18446744073709551577 bytes in realloc.
> This was caught in testing of the real application because the jemalloc 
> memory allocator's xmalloc setting was in use, which causes an abort on any 
> failed memory allocation.
> {code}
> #include "avro.h"
> const char* schemaBuffer = "\
> \
> {\"namespace\": \"example.avro\",\
>  \"type\":\"record\",\
>  \"name\":\"example\",\
>  \"fields\": [\
>     {\"name\": \"s1\", \"type\": \"string\"}\
>  ]\
> }";
> int main(int argc, char* argv[])
> {
>   char buffer[4];
>   memset(buffer,0 , sizeof(buffer));
>   buffer[0] = 0x4f;
>   avro_reader_t reader = avro_reader_memory(buffer, 1);
>   avro_schema_t schema;
>   if(!avro_schema_from_json_length(schemaBuffer, strlen(schemaBuffer), 
> &schema) && reader)
>   {
>     avro_value_iface_t* iface = avro_generic_class_from_schema(schema);
>     if(iface)
>     {
>       avro_value_t row;
>       avro_generic_value_new(iface, &row);
>       // the following attempts to allocate memory for a string of size -39 
> bytes represented as int64_t in:
>       //      static int read_string(avro_reader_t reader, char **s, int64_t 
> *len)
>       // -39 is cast a to size_t of size 18446744073709551577 which is passed 
> to the size to realloc in:
>       //      avro_default_allocator(void *ud, void *ptr, size_t osize, 
> size_t nsize)
>       // an attempt is then made to allocate memory of 18446744073709551577 
> bytes
>       avro_value_read(reader, &row);
>     }
>   }
> }
> {code}
> Should there by additional validation steps in application code to ensure the 
> buffer is valid for the schema before reading it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to