[ https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13929826#comment-13929826 ]
James Taylor commented on PHOENIX-66: ------------------------------------- Patch looks good, [~gabriel.reid]. Thanks so much. A couple of minor items: - Can you use the standard JDBC APIs for instantiating the array instead of the internal PArrayDataType methods? See ArrayTest for examples. - Do you have an error check for the array delimiter being the same as the field delimiter, as this would causing issues, no? - Can you use our tab/spacing conventions and compiler settings (see Eclipse prefs in phoenix/dev dir)? - We've gotten away from normalizing column names automatically, as it causes problems if folks use case sensitive names. Would you mind updating that? - Do all tests pass? - Can the patch be applied to 3.0, 4.0, and master (as we have three branches now)? If not, would you mind attaching separate patches for the different branches? [~jeffreyz] - are you ok with this change? [~mujtaba] and [~james.viole...@ds-iq.com] - any feedback on this? > Support array creation from CSV file > ------------------------------------ > > Key: PHOENIX-66 > URL: https://issues.apache.org/jira/browse/PHOENIX-66 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Fix For: 3.0.0 > > Attachments: PHOENIX-66-intermediate.patch, PHOENIX-66.patch > > > We should support being able to parse an array defined in our CVS file. > Perhaps something like this: > a, b, c, [foo, 1, bar], d > We'd know (from the data type of the column), that we have an array for the > fourth field here. > One option to support this would be to implement the > PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, > though, as we'd introduce a dependency from PDataType to our CSVLoader, since > we'd need to in turn parse each element. Also, we don't have a way to pass > through the custom delimiters that might be in use. > Another pretty trivial, though a bit more constrained approach would be to > look at the column ARRAY_SIZE to control how many of the next CSV columns > should be used as array elements. In this approach, you wouldn't use the > square brackets at all. You can get the ARRAY_SIZE from the column metadata > through connection.getMetaData().getColumns() call, through > resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL > statement, so we'd need to do something for the case where it's not specified. > A third option would be to handle most of the parsing in the CSVLoader. We > could use the above bracket syntax, and then collect up the next set of CSV > field elements until we hit the unescaped ']'. Then we'd use our standard > JDBC APIs to build the array and continue on our merry way. > What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack > at it? -- This message was sent by Atlassian JIRA (v6.2#6252)