huangzengtian opened a new issue, #846:
URL: https://github.com/apache/fesod/issues/846

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fesod/issues) and 
found nothing similar.
   
   
   ### Fesod version
   
   1.3.0
   
   ### JDK version
   
   1.8
   
   ### Operating system
   
   _No response_
   
   ### Steps To Reproduce
   
   ```
   String path = "/aws-report.tsv";
   try (FileInputStream inputStream = new FileInputStream(path)) {
   
       FastExcel.read(inputStream)
               // 亚马逊报告文档格式为 TSV
               .csv()
               .delimiter(CsvConstant.TAB)
               .quote(CsvConstant.DOUBLE_QUOTE, QuoteMode.MINIMAL)
               .registerReadListener(new PageReadListener<>(list -> {
                   System.out.println(list);
               }, 1))
               .doRead();
   } catch (IOException e) {
       throw new RuntimeException(e);
   }
   ```
   报错如下
   <img width="1342" height="262" alt="Image" 
src="https://github.com/user-attachments/assets/34c05a94-ee0b-4e95-b69c-1d8627b8eaae";
 />
   读取的 CSV文件  
[aws-report.tsv](https://github.com/user-attachments/files/25062857/aws-report.tsv)
 (最后一行数据的 Title 列的值包含引号,导致解析失败)
   
   ### Current Behavior
   
   读取CSV带引号的值报错,这行代码 quote(CsvConstant.DOUBLE_QUOTE, QuoteMode.MINIMAL) 中的 
QuoteMode.MINIMAL 参数不起作用
   参考官方文档如下:
   <img width="1631" height="535" alt="Image" 
src="https://github.com/user-attachments/assets/15162768-9277-4c6f-b3bd-19e86efc4787";
 />
   基于依赖的 apache-commons-csv 组件的 
org.apache.commons.csv.CSVFormat.Builder#setQuoteMode() 方法注释说明分析: QuoteMode 
只对写CSV有效,其中 MINIMAL 枚举值 可用于解决写时值包含分隔符或者引号的问题。但是  **QuoteMode 
参数无论设置什么值,对读CSV都不起作用**。 
   <img width="691" height="399" alt="Image" 
src="https://github.com/user-attachments/assets/75090bea-9c30-4720-8c5b-7293a5f53e19";
 />
   于是我又用 apache-commons-csv 的原生代码跑了一遍,也是报同样的错,证实了猜想:
   ```
           try {
               String path = "/aws-report.tsv";
   
               Reader in = new FileReader(path);
   
               CSVFormat format = CSVFormat.DEFAULT.builder()
                       .setHeader()
                       .setSkipHeaderRecord(true)
                       .setDelimiter('\t')
                       .setQuote('\"')
                       .build();
   
               try (CSVParser parser = new CSVParser(in, format)) {
                   for (CSVRecord record : parser) {
                       System.out.println(record);
                   }
               }
           } catch (IOException e) {
               throw new RuntimeException(e);
           }
   ```
   
   ### Expected Behavior
   
   目前暂时的解决方案是 将 quote 设置为 null,然后处理值时,需手动去掉值前后的引号。这个在 apache-commons-csv 可行。
   ```
   try {
               String path = "/aws-report.tsv";
               Reader in = new FileReader(path);
               CSVFormat format = CSVFormat.DEFAULT.builder()
                       .setHeader()
                       .setSkipHeaderRecord(true)
                       .setDelimiter('\t')
                       .setQuote(null) // 相当于是把引号当成值的一部分,最终获得的值会包含前后引号
                       .build();
   
               try (CSVParser parser = new CSVParser(in, format)) {
                   for (CSVRecord record : parser) {
                       System.out.println(record);
                   }
               }
           } catch (IOException e) {
               throw new RuntimeException(e);
           }
   ```
   源码里面的描述:
   org.apache.commons.csv.CSVFormat.Builder#setQuote(java.lang.Character)
   <!-- Failed to upload "image.png" -->
   但是在 fesod 里面很粗暴的做了 null 判断,直接把这个行为拦截了。这个有点违背 apache-commons-csv 的初衷。
   cn.idev.excel.read.builder.CsvReaderBuilder#quote(java.lang.Character, 
org.apache.commons.csv.QuoteMode)
   <img width="635" height="341" alt="Image" 
src="https://github.com/user-attachments/assets/a1222f94-f207-40a4-8cf7-fff03ebe7475";
 />
   既然 fesod 不支持 quote 设置为 null,那我只好尝试将其设置为相似含义的 '\0' ,果然可行,所以最终可成功解析的代码如下:
   ```
   String path = "/aws-report.tsv";
           try (GZIPInputStream inputStream = new GZIPInputStream(new 
FileInputStream(path))) {
   
               FastExcel.read(inputStream)
   //                    .head(LedgerDetailViewDataExcelModel.class)
                       // 亚马逊报告文档格式为 TSV
                       .csv()
                       .delimiter(CsvConstant.TAB)
                       .quote('\0') // quote character 设置为 null 本配置会被忽略,
                                    // 设置为 '\0' 才能达到与底层 apache csv 同样的效果
                       .registerConverter(new QuoteCleanupConverter())
                       .registerReadListener(new PageReadListener<>(list -> {
                           System.out.println(list);
                       }, 1))
                       .doRead();
           } catch (IOException e) {
               throw new RuntimeException(e);
           }
   ```
   转换器代码
   ```
   /**
    * 读取时去掉值前后引号的转换器
    */
   public class QuoteCleanupConverter implements Converter<String> {
   
       @Override
       public Class<?> supportJavaTypeKey() {
           return String.class;
       }
   
       @Override
       public CellDataTypeEnum supportExcelTypeKey() {
           return CellDataTypeEnum.STRING;
       }
   
       @Override
       public String convertToJavaData(ReadCellData<?> cellData, 
ExcelContentProperty contentProperty,
                                       GlobalConfiguration globalConfiguration) 
{
           String value = cellData.getStringValue();
           if (value == null) {
               return null;
           }
   
           // 如果首尾都有引号,则去掉
           if (value.length() >= 2 && value.startsWith("\"") && 
value.endsWith("\"")) {
               return value.substring(1, value.length() - 1);
           }
   
           return value;
       }
   }
   ```
   
   ### Anything else?
   
   以上是我的一点浅显分析,希望得到团队成员关于这个问题的回复。
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to