Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/594
Given all the above, there is very simple fix to the particular case that
this bug covers.
{code}
private void writeDataAllText(MapWriter map, FieldSelection selection,
...
case VALUE_NULL:
// Here we do have a type. This is a null VarChar.
handleNullString(map, fieldName);
break;
...
/**
* Create a VarChar column. No need to explicitly set a
* null value; nulls are the default.
* <p>
* Note: This only works for all-text mode because we can
* predict that, if we ever see an actual value, it will be
* treated as a VarChar. This trick <b>will not</b> work for the
* general case because we cannot predict the actual column
* type.
* @param writer
* @param fieldName
*/
private void handleNullString(MapWriter writer, String fieldName) {
writer.varChar(fieldName);
}
{code}
The above simply leverages the existing mechanism for mapping columns to
types, and for filling in missing null values.
Output, when printing {{tooManyNulls.json}} to CSV:
{code}
4096 row(s):
c1
null
...
null
1 row(s):
c1
Hello World
Total rows returned : 4097. Returned in 242ms.
{code}
Performance here will be slower than master because we now do a field
lookup for each null column where in the past we did not. The performance of
null columns, however, should be identical to non-null columns. And,
performance of the above fix should be identical to the fix proposed in this
PR: but the code here is simpler.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---