[ https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930585#comment-13930585 ]
Gabriel Reid commented on PHOENIX-66: ------------------------------------- {quote} How about we do the following if the field and array delimiters are the same: 1) when we encounter an array in this case, we parse a single field for an array (or some other default behavior if it's easier to implement), and 2) log a warning when this is the case that documents (1). {quote} Thinking about this a bit more now, having the same delimiter is actually a possible use case if quoting is used, and will work as-is. For example, consider a table that has ID (INTEGER) and VALS (INTEGER ARRAY) fields. An import CSV file for that format could use commas as both the field and array delimiter if the input file looked like this: {code} 1,"1,2,3,4" 2,"5,6,7" {code} Considering that situation, I think that the most we would want to do is log a warning if both delimiters are the same. > Support array creation from CSV file > ------------------------------------ > > Key: PHOENIX-66 > URL: https://issues.apache.org/jira/browse/PHOENIX-66 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Fix For: 3.0.0 > > Attachments: PHOENIX-66-intermediate.patch, PHOENIX-66.patch > > > We should support being able to parse an array defined in our CVS file. > Perhaps something like this: > a, b, c, [foo, 1, bar], d > We'd know (from the data type of the column), that we have an array for the > fourth field here. > One option to support this would be to implement the > PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, > though, as we'd introduce a dependency from PDataType to our CSVLoader, since > we'd need to in turn parse each element. Also, we don't have a way to pass > through the custom delimiters that might be in use. > Another pretty trivial, though a bit more constrained approach would be to > look at the column ARRAY_SIZE to control how many of the next CSV columns > should be used as array elements. In this approach, you wouldn't use the > square brackets at all. You can get the ARRAY_SIZE from the column metadata > through connection.getMetaData().getColumns() call, through > resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL > statement, so we'd need to do something for the case where it's not specified. > A third option would be to handle most of the parsing in the CSVLoader. We > could use the above bracket syntax, and then collect up the next set of CSV > field elements until we hit the unescaped ']'. Then we'd use our standard > JDBC APIs to build the array and continue on our merry way. > What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack > at it? -- This message was sent by Atlassian JIRA (v6.2#6252)