[jira] [Commented] (AVRO-1360) C++ Resolving decoder is not working when reader schema has more fields than writer schema

Ramana Suvarapu (JIRA) Wed, 02 Oct 2013 11:24:38 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784242#comment-13784242
 ]


Ramana Suvarapu commented on AVRO-1360:
---------------------------------------

Hi Thiru,

I did some investigation to find out the root cause for the slowness of 
resolving decoder and why it’s taking huge memory when dealing with larger 
schemas. When I tried ResolvingDecoder using our schema, 
ValidatingGrammer::generate step is taking forever and after 2 to 3  minutes 
it’s taking the entire  memory of the system and method never returns. Here is 
the task manager image during the ValidatingGrammer::generate step. 

I debugged the code and from the call stack I noticed that huge amount of time 
and memory is being spent during boost::any and std::vector operations, 
constructors, destructors . Currently Production, RootInfo and RepeaterInfo are 
type-define as

typedef std::vector<Symbol> Production;
typedef boost::tuple<size_t, bool, Production, Production> RepeaterInfo;
typedef boost::tuple<Production, Production> RootInfo;
Since each of these collection stores Objects, lot of temporary objects are 
being created and destructed. Please find the attached call-stack for more 
information.

I modified Production, RepeaterInfo and Rootinfo collections to store 
shared_ptrs instead of Objects and modified the decoder code accordingly and 
when I ran the program, ValidatingGrammer::generate performed fast and  return 
the result within few milliseconds (less than 10 ms).

typedef std::vector<boost::shared_ptr<Symbol>> Production;
typedef boost::tuple<size_t, bool, boost::shared_ptr<Production>, 
boost::shared_ptr<Production>> RepeaterInfo;
typedef boost::tuple<boost::shared_ptr<Production>, 
boost::shared_ptr<Production>> RootInfo;

I created a patch for my changes and attached in this email. Please take a look 
and let me know your thoughts on the changes.

After these changes, ResolvingGrammarGenerator::generate() is getting executed 
faster.

Another biggest bottle neck is fixup method() and which is taking lot of time 
when production vector has lot of nested symbols.
What is the significance of recursive method fixup()? This method is also 
taking long time to execute. Can you please let me how this method works? Is 
there anyway it can be improved?



> C++ Resolving decoder is not working when reader schema has more fields than 
> writer schema
> ------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1360
>                 URL: https://issues.apache.org/jira/browse/AVRO-1360
>             Project: Avro
>          Issue Type: Bug
>          Components: c++
>    Affects Versions: 1.7.4
>            Reporter: Ramana Suvarapu
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-1360-2.patch, AVRO-1360-3.patch, AVRO-1360.patch, 
> testreader, testreader-1, testreader.hh, testwriter, testwriter-1, 
> testwriter.hh
>
>
> When reader schema has more number of fields than writer schema, C++ 
> implementation of resolving decoder is throwing exception "throwing exception 
> "Don't know how to handle excess fields for reader.” with out checking 
> whether fields are optional or fields have default values.
> Attached are reader and writer schemas. Record in reader schema has 2 
> additional fields than writer schema. One field is required field but it has 
> default value and another one is optional field (union of null and string). 
> Since one has default value and another is optional both reader and writer 
> schemas are supposed to be compatible. 
>  
> {"name": "defaultField", "type": "string", "default": "DEFAULT", 
> "declared":"true"},     
> {"name": "optionalField", "type": ["string", "null"],"declared":"true"},
>  
> main()
> {
>   avro::ValidSchema readerSchema = load("reader.json");
>   avro::ValidSchema writerSchema = load("writer.json");
>   avro::DecoderPtr d = avro::resolvingDecoder(writerSchema, 
> readerSchema,avro::binaryDecoder());
> }
>  
> But when I tried to create resolving decoder, I am getting "Don't know how to 
> handle excess fields for reader.” But Java implementation works.  
>  
> Can you please let us know if there are any other limitations with c++ 
> implementation of ResolvingDecoder? We are planning to use it in our project 
> and we want to make sure it works as per avro specification.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (AVRO-1360) C++ Resolving decoder is not working when reader schema has more fields than writer schema

Reply via email to