Jacques, > ... it ... should probably check if a column is already setup in the output mutation and, if so, don't re-add the ensure at least one column. Thus probably requires a new method in output mutator (isEmpty or size)
Some questions on the various moving pieces: Does ensureAtLeastOneField() need to check only for the fieldWriter.integer(...) call, or does it also need to check for the earlier fieldWriter.map(...) call? (I'm not clear on what cases that loop can encounter.) Would the method go (be declared) on MapWriter (re ensureAtLeastOneField()'s fieldWriter variable) or on ComplexWriter (re its writer parameter) (or something else)? Which data should the method implementation be consulting? For example, I see that SingleMapWriter has both container and fields members, and traversing down container (at least in the current case I'm looking at) leads to fields primary, secondary, and delegate (in MapWithOrdinal). Which is the primary/best source for whether there's any column (or subcolumn?) yet?) Thanks, Daniel Jacques Nadeau wrote:
It's purpose is to make sure that a projection that finds no columns still produces output. Think record {a:4,b:6} If I say 'select c from t' , the reader (without this functionality) return zero columns (not supported). As such, ensure at least one adds a column. I believe it is still needed but should probably check if a column is already setup in the output mutation and, if so, don't re-add the ensure at least one column. Thus probably requires a new method in output mutator (isEmpty or size) On Oct 3, 2015 7:49 PM, "Daniel Barclay" <[email protected] <mailto:[email protected]>> wrote: What exactly is the purpose of ensureAtLeastOneField() org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 ยท apache/drill <https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)? Is it still needed? Currently, it makes the first field of a map be a nullable-integer field if the JSON reader reads no nows. However, it does that /regardless/ of whether the first field already exists from earlier readers, causing a later reader and ScanBatch to signal a schema change when there wasn't really a schema change. This is currently causing breaking in the attempt to fix the ScanBatch/IterOutcome problems underlying DRILL-2288. Example case: - There are three JSON files. The first and last to be read have the same schema. The middle file is empty. They are all read by the same ScanBatch. - The first JSON reader sets map fields. - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't set, so its ensureAtLeastOneField() thinks it needs to add a field, but forcibly sets the first field to be a nullable-int field--/regardless /of whether a first field exists, so it changes what the first reader set it to. - Then, somewhere in and/or downstream of the third reader (with in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors (mentioning the second reader's NullableIntVector vs. the original reader's type for the first field) and/or schema-change-not-supported errors because ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really change (between the first and third JSON files). Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or unsupported-schema-change errors, and did not cause any new errors in the java-exec unit tests. (I haven't checked other tests yet.) Also, ensureAtLeastOneField() (or next()) has a comment about making sure there's a field in order to return a record count, but next() returns the record count. Those two things make me wonder if ensureAtLeastOneField()is now obsolete. Can it be deleted now? Or is it the case that it is still needed, but it needs to check whether there are already any fields before (currently blindly) creating one? Thanks, Daniel-- Daniel BarclayMapR Technologies
-- Daniel Barclay MapR Technologies
