Hi, Renato. Thanks for your answer.
About "content":["null","bytes"], I was reasoning from implementing only ["null","string"], but if implements ["null","type"] as in Cassandra module, all is fine. *But* the title of the issue must be changed :) So accumulo, dynamodb, etc, fulfil ["null","type"]. And this get's us to tell what is "type": records included or excluded? About "metadata" in Nutch's WebPage, at this moment is mandatory, but in NUTCH-1477 [1] was declared as optional. This is related with the question just behind this line. With ["null","type"] including records, the nesting and recursiveness arrive. So the question is: where to cut? -in order of completeness- 1.- Only ["null","string"] in first level of the record (no nesting)? 2.- Full ["null","string"]? 3.- ["null","type"] except records? 4.- ["null","type"] and only 1 level nested records? (if records optionals, nutch will need at least this) 5.- full ["null","type"] ? Lewis told about ["null","string"], I guess "2.- full [null,string]". What you told seems like "3.- [null,type] except records". I always (wrongly) thought about "5.- full [null,type]". I proposed the modifications to tests clases for "2.- full [null,string]", but I think you would like tests modifications for "3.- [null,type] except records". There is no problem in making modifications for "3.-". Am I wrong in my thoughts? Thanks! Regards, Alfonso Nishikawa [1] - https://issues.apache.org/jira/browse/NUTCH-1477 2013/5/7 Renato Marroquín Mogrovejo <[email protected]> > Hi Alfonso, > > First of all, thanks for pushing this issue! > > > 2013/5/7 Alfonso Nishikawa <[email protected]>: > > Hi all, > > > > In order to accomplish GORA-174 ([0] GORA compiler does not handle > > ["string", "null"] unions in the AVRO schema), it has been noticed by > Lewis > > that we ("I" specially ;) should stick to the requirements of the issue. > > With no doubt this is true! > > > > I would want to open a short (short short!) debate about that > specification > > because I fee reluctant until an acknowledge (and Lewis suggested to ask > to > > all). Here is Nutch's WebPage schema as example: > > > > { > > "type": "record", > > "name": "WebPage", > > "namespace": "org.apache.gora.examples.generated", > > "fields" : [ > > {"name": "url", "type": "string"}, > > {"name": "content", "type": ["null","bytes"]}, > > {"name": "parsedContent", "type": {"type":"array", "items": > "string"}}, > > {"name": "outlinks", "type": {"type":"map", "values":"string"}}, > > {"name": "metadata", "type": { > > "name": "Metadata", > > "type": "record", > > "namespace": "org.apache.gora.examples.generated", > > "fields": [ > > {"name": "version", "type": "int"}, > > {"name": "data", "type": {"type": "map", "values": "string"}} > > ] > > }} > > ] > > } > > > > At this moment I saw that in the original issue NUTCH-1477 [1] the > problem > > was about a ["null","bytes"], so I think we must not stick to solving > only > > ["null","string"]. > > I thought we were solving single-type union types. So there shouldn't > be a difference in persisting ["null","bytes"] or ["null","string"] as > they are both single-type unions. In Gora-Cassandra, we serialize > everything into bytes, and then depending on the schema we retrieved > as required. We don't need metadata at this point because the value > will be null, or whatever else. > > > In the schema shown here will happen that "metadata" is mandatory and > > GORA-174 does not talk about optional records. Maybe we should fix that > too. > > Sorry but why is the metadata field required? Is it because of Nutch > or anything implicit in Avro? > > > Another more thing: ["null","string"] requirement implies that nested > > records must handle it too. In the example above, "Metadata : data" > should > > allow a map of ["null","string"], and *lets suppoose "Metadata : version" > > was String*. allow "Metadata : version of type ["null","string"]. > > This is true, we should test if our current approaches solve this as > well, and if not, then they would be incomplete. We will have to go > over that again in Gora-Cassandra ): > > > If this is not desired, will have to redefine the issue requisites. For > > example something like: "allow [null,String] on topmost records fields". > > > > =============== > > Taking ONLY GORA-174 title: ["null","string"] I will have to make this > > modifications: > > > > - Modify Nutch's webpage.avsc. "Content" will have to be mandatory :( > > Why is this? I mean making "content" mandatory > > > - Modify tests. Specifically testGetNested() to check nested > > ["null","strings"]. I think Cassandra module does will not pass this > test. > > Yeah mate, this is truth. I think if we are "supporting" single-type > unions, then nested records for this feature should be supported as > well. > > > =============== > > > > Lewis told about creating other issues for nested and mutitype-unions. > It's > > not my view, but I agree the common decision :) > > > > Opinions? > > I think it is better to create another issue for the nested issues as > well, so in that way we can traceback changes more easily and make > patches more digestible for people. Maybe we should just relate them > within JIRA to know that those issues are actually related or maybe > marking them as one depends on the other one. > > > Thanks at least for reading and getting to this line! :) > > Thank you for taking the time to write this! (: > > > Renato M. > > > Regards, > > > > Alfonso Nishikwa > > > > [0] - https://issues.apache.org/jira/browse/GORA-174 > > [1] - https://issues.apache.org/jira/browse/NUTCH-1477 > > > > -- > > "Drinking bloody marys all night will make you feel like a corpse in the > > morning." > -- "Drinking bloody marys all night will make you feel like a corpse in the morning."

