Re: [PR] Web console: use arrayIngestMode: array (druid)

via GitHub Tue, 09 Jan 2024 16:17:47 -0800


clintropolis commented on code in PR #15588:
URL: https://github.com/apache/druid/pull/15588#discussion_r1446752690



##########
web-console/src/druid-models/dimension-spec/dimension-spec.ts:
##########
@@ -64,6 +65,12 @@ export const DIMENSION_SPEC_FIELDS: Field<DimensionSpec>[] = 
[
     defaultValue: 'SORTED_ARRAY',
     suggestions: ['SORTED_ARRAY', 'SORTED_SET', 'ARRAY'],
   },
+  {
+    name: 'castToType',
+    type: 'string',
+    defined: typeIsKnown(KNOWN_TYPES, 'auto'),
+    suggestions: [undefined, 'ARRAY<STRING>', 'ARRAY<LONG>', 'ARRAY<DOUBLE>'],

Review Comment:
   >When you say that ARRAY<FLOAT> works too but is just mapped to 
ARRAY<DOUBLE> are you advocating that we add it to the suggestions?
   
   Nah, I don't think it is necessary since they will be handled the same as 
`ARRAY<DOUBLE>` when used by 'auto', was just mentioning all of the valid 
strings (there are actually a few extras for backwards compatibility 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/Types.java#L37)
   
   >Remember that a standard dimension spec type can be flooooot (it gets 
mapped to string) but we don't want to suggest that.
   
   Yea, that is true, though this is a bit different because `castToType` only 
supports real types else if it cant parse them it falls back to full 'auto' 
behavior (though I suppose this also could have been an error to not repeat the 
silent different behavior of the dimension schema types themselves).
   
   >Why would we not use 'auto' with 'castToType' everywhere 🤔 ? I really like 
it because it let's us get away from dimension types (which have a confusing 
relationship with column types).
   
   I think eventually we do want to use them everywhere, I just wasn't too sure 
if now is the right time quite yet. There are a few reasons you might not want 
to use it today. First is of course if you want MVDs, since the STRING column 
produced by 'auto' will never make an MVD. The second would be I think if you 
want numeric columns but do not want the indexes that all 'auto' numeric 
columns have today, since this can make the segments larger, though other than 
disk space there probably isn't much of a penalty for this if not actually 
filtering the numbers, leaving the main cost just longer ingest time to build 
the indexes. I also plan to eventually allow 'auto' and nested columns to be 
customized to allow things like leaving out the indexes similar to what we can 
do today with string columns (and eventually, specifying alternative indexes 
once I implement them).  Given that the second thing isn't much of an issue, I 
don't really think it would be a problem to switch native to alway
 s using 'auto' with 'castToType' for all cases except for MVDs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Web console: use arrayIngestMode: array (druid)

Reply via email to