[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #23882: [enhancement](broker-load) support compress type for old broker load, and split compress type from file format

via GitHub Sun, 10 Sep 2023 06:00:28 -0700


TangSiyang2001 commented on code in PR #23882:
URL: https://github.com/apache/doris/pull/23882#discussion_r1320766043



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/Util.java:
##########
@@ -534,29 +536,38 @@ public static String bytesToHex(byte[] bytes) {
 
 
     @NotNull
-    public static TFileFormatType getFileFormatType(String path) {
+    public static TFileFormatType getFileFormatTypeFromPath(String path) {
         String lowerCasePath = path.toLowerCase();
-        if (lowerCasePath.endsWith(".parquet") || 
lowerCasePath.endsWith(".parq")) {
+        if (lowerCasePath.contains(".parquet") || 
lowerCasePath.contains(".parq")) {
             return TFileFormatType.FORMAT_PARQUET;
-        } else if (lowerCasePath.endsWith(".gz")) {
-            return TFileFormatType.FORMAT_CSV_GZ;
-        } else if (lowerCasePath.endsWith(".bz2")) {
-            return TFileFormatType.FORMAT_CSV_BZ2;
-        } else if (lowerCasePath.endsWith(".lz4")) {
-            return TFileFormatType.FORMAT_CSV_LZ4FRAME;
-        } else if (lowerCasePath.endsWith(".lzo")) {
-            return TFileFormatType.FORMAT_CSV_LZOP;
-        } else if (lowerCasePath.endsWith(".lzo_deflate")) {
-            return TFileFormatType.FORMAT_CSV_LZO;
-        } else if (lowerCasePath.endsWith(".deflate")) {
-            return TFileFormatType.FORMAT_CSV_DEFLATE;
-        } else if (lowerCasePath.endsWith(".snappy")) {
-            return TFileFormatType.FORMAT_CSV_SNAPPYBLOCK;
+        } else if (lowerCasePath.contains(".orc")) {
+            return TFileFormatType.FORMAT_ORC;
+        } else if (lowerCasePath.contains(".json")) {
+            return TFileFormatType.FORMAT_JSON;
         } else {
             return TFileFormatType.FORMAT_CSV_PLAIN;
         }
     }
 
+    public static TFileFormatType getFileFormatTypeFromName(String formatName) 
{
+        String lowerFileFormat = 
Objects.requireNonNull(formatName).toLowerCase();
+        if (lowerFileFormat.equals("parquet")) {
+            return TFileFormatType.FORMAT_PARQUET;
+        } else if (lowerFileFormat.equals("orc")) {
+            return TFileFormatType.FORMAT_ORC;
+        } else if (lowerFileFormat.equals("json")) {
+            return TFileFormatType.FORMAT_JSON;
+            // csv/csv_with_name/csv_with_names_and_types treat as csv format
+        } else if (lowerFileFormat.equals(FeConstants.csv) || 
lowerFileFormat.equals(FeConstants.csv_with_names)
+                || lowerFileFormat.equals(FeConstants.csv_with_names_and_types)
+                // TODO: Add TEXTFILE to TFileFormatType to Support hive text 
file format.
+                || lowerFileFormat.equals(FeConstants.text)) {
+            return TFileFormatType.FORMAT_CSV_PLAIN;
+        } else {
+            return TFileFormatType.FORMAT_UNKNOWN;

Review Comment:
   This original way is adopted too general in FE, may need another typical PR 
to solve that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #23882: [enhancement](broker-load) support compress type for old broker load, and split compress type from file format

Reply via email to