Michael Howard created HIVE-14632: ------------------------------------- Summary: beeline outputformat needs better documentation Key: HIVE-14632 URL: https://issues.apache.org/jira/browse/HIVE-14632 Project: Hive Issue Type: Improvement Components: Beeline Affects Versions: 0.14.0 Environment: Hive HiveServer2 wiki Reporter: Michael Howard
SUMMARY * need better wiki page doc for beeline outputformat option * should explicitly say that "double quote characters" are used to enclose fields which need enclosing. * Should describe the treatment of embedded double quote chars as "doubled" DETAIL The page at: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Separated-ValueOutputFormats describes separated value outputformats csv/tsv/csv2/tsv2, etc. I found doc to be inadequate and terminology to be confusing. > These conform better to standard CSV convention, which adds quotes around a > cell value What kind of quotes? The only reference to quotes in this section refers to single quotes for the deprecated csv/tsv format. The JIRA at https://issues.apache.org/jira/browse/HIVE-8615 clarifies a bit: - Old format quoted every field. New format quotes only fields that contain a delimiter or the quoting char. - Old format quoted using single quotes, new format quotes using double quotes - Old format didn't escape quotes in a field (a bug). New format does escape the quotes However, neither this JIRA page nor the wiki page doc define what is meant by "escaping the quotes". Q: In this context, does escaping mean "backslash escaping" or "double embedded double quotes" or something else? Investigation of source code reveals that this is using SuperCSV. SuperCSV does not support backslash-escape of embedded quotes. See last line of: https://super-csv.github.io/super-csv/csv_specification.html THE END -- This message was sent by Atlassian JIRA (v6.3.4#6332)