[ https://issues.apache.org/jira/browse/AVRO-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784242#comment-13784242 ]
Ramana Suvarapu commented on AVRO-1360: --------------------------------------- Hi Thiru, I did some investigation to find out the root cause for the slowness of resolving decoder and why it’s taking huge memory when dealing with larger schemas. When I tried ResolvingDecoder using our schema, ValidatingGrammer::generate step is taking forever and after 2 to 3 minutes it’s taking the entire memory of the system and method never returns. Here is the task manager image during the ValidatingGrammer::generate step. I debugged the code and from the call stack I noticed that huge amount of time and memory is being spent during boost::any and std::vector operations, constructors, destructors . Currently Production, RootInfo and RepeaterInfo are type-define as typedef std::vector<Symbol> Production; typedef boost::tuple<size_t, bool, Production, Production> RepeaterInfo; typedef boost::tuple<Production, Production> RootInfo; Since each of these collection stores Objects, lot of temporary objects are being created and destructed. Please find the attached call-stack for more information. I modified Production, RepeaterInfo and Rootinfo collections to store shared_ptrs instead of Objects and modified the decoder code accordingly and when I ran the program, ValidatingGrammer::generate performed fast and return the result within few milliseconds (less than 10 ms). typedef std::vector<boost::shared_ptr<Symbol>> Production; typedef boost::tuple<size_t, bool, boost::shared_ptr<Production>, boost::shared_ptr<Production>> RepeaterInfo; typedef boost::tuple<boost::shared_ptr<Production>, boost::shared_ptr<Production>> RootInfo; I created a patch for my changes and attached in this email. Please take a look and let me know your thoughts on the changes. After these changes, ResolvingGrammarGenerator::generate() is getting executed faster. Another biggest bottle neck is fixup method() and which is taking lot of time when production vector has lot of nested symbols. What is the significance of recursive method fixup()? This method is also taking long time to execute. Can you please let me how this method works? Is there anyway it can be improved? > C++ Resolving decoder is not working when reader schema has more fields than > writer schema > ------------------------------------------------------------------------------------------ > > Key: AVRO-1360 > URL: https://issues.apache.org/jira/browse/AVRO-1360 > Project: Avro > Issue Type: Bug > Components: c++ > Affects Versions: 1.7.4 > Reporter: Ramana Suvarapu > Assignee: Thiruvalluvan M. G. > Attachments: AVRO-1360-2.patch, AVRO-1360-3.patch, AVRO-1360.patch, > testreader, testreader-1, testreader.hh, testwriter, testwriter-1, > testwriter.hh > > > When reader schema has more number of fields than writer schema, C++ > implementation of resolving decoder is throwing exception "throwing exception > "Don't know how to handle excess fields for reader.” with out checking > whether fields are optional or fields have default values. > Attached are reader and writer schemas. Record in reader schema has 2 > additional fields than writer schema. One field is required field but it has > default value and another one is optional field (union of null and string). > Since one has default value and another is optional both reader and writer > schemas are supposed to be compatible. > > {"name": "defaultField", "type": "string", "default": "DEFAULT", > "declared":"true"}, > {"name": "optionalField", "type": ["string", "null"],"declared":"true"}, > > main() > { > avro::ValidSchema readerSchema = load("reader.json"); > avro::ValidSchema writerSchema = load("writer.json"); > avro::DecoderPtr d = avro::resolvingDecoder(writerSchema, > readerSchema,avro::binaryDecoder()); > } > > But when I tried to create resolving decoder, I am getting "Don't know how to > handle excess fields for reader.” But Java implementation works. > > Can you please let us know if there are any other limitations with c++ > implementation of ResolvingDecoder? We are planning to use it in our project > and we want to make sure it works as per avro specification. -- This message was sent by Atlassian JIRA (v6.1#6144)