jorisvandenbossche commented on issue #43604: URL: https://github.com/apache/arrow/issues/43604#issuecomment-2286212175
With the original bz2 file, reading 2 chunks with a 2**30 blocksize (as in your original example), and then running `tail -n 50 test_out.csv` gives: <details> ``` CCNC(=S)NCCCOCC1CCCO1 m_8____134546____83696 O=C(NCCCOCC1CCCO1)C(Br)Br m_22____4836780____20825978 O=C(NCCCOCC1CCCO1)C(Cl)Cl m_22____4836780____20853970 CCC(CC)NCCCOCC1CCCO1 m_270004____8288964____8798310 N#CCC(=O)NCCCOCC1CCCO1 m_22____4836780____8794444 CC(NCCCOCC1CCCO1)C1CC1 m_269862____7193234____7200230 COCC(=O)NCCCOCC1CCCO1 m_527____154721____157834 CC(C)C(C)NCCCOCC1CCCO1 m_270004____8288964____8334612 C(CNC1CCSC1)COCC1CCCO1 m_270004____8288964____8157326 CCS(=O)(=O)NCCCOCC1CCCO1 m_40____134551____6289440 CN(C)C(=O)NCCCOCC1CCCO1 m_68____1082804____22160914 CCNC(N)=NCCCOCC1CCCO1 m_264822____3500278____3499528 CNC(=NCCCOCC1CCCO1)NC m_264822____3500278____3499530 NC(=O)CCNCCCOCC1CCCO1 m_273610____11522760____11515976 CSCC(=O)NCCCOCC1CCCO1 m_527____154721____237535 CCC(O)CNCCCOCC1CCCO1 m_273610____11522760____11516276 CCCC(C)NCCCOCC1CCCO1 m_269862____7193234____7200560 CN(C)C(N)=NCCCOCC1CCCO1 m_264822____3500278____3499612 NCCC(=O)NCCCOCC1CCCO1 m_240690____7353008____3025680 CCC(C)CNCCCOCC1CCCO1 m_270004____8288964____9139664 CC(N)C(=O)NCCCOCC1CCCO1 m_240690____7353008____3025692 O=C(CCBr)NCCCOCC1CCCO1 m_270062____7616918____20844442 O=C(CCO)NCCCOCC1CCCO1 m_487____151097____12879784 O=S(=O)(CCl)NCCCOCC1CCCO1 m_40____134551____10300412 CC(C)(C)CNCCCOCC1CCCO1 m_270004____8288964____7548792 NC(=O)C(=O)NCCCOCC1CCCO1 m_22____4836780____12053348 NOCC(=O)NCCCOCC1CCCO1 m_240690____7353008____13151248 CC(CCO)NCCCOCC1CCCO1 m_270004____8288964____13397824 CNCC(=O)NCCCOCC1CCCO1 m_240690____7353008____7368582 CSCC(C)NCCCOCC1CCCO1 m_270004____8288964____15781876 C=CCCCNCCCOCC1CCCO1 m_270004____8288964____17507834 COCC(C)NCCCOCC1CCCO1 m_270004____8288964____8844972 C=C(Cl)C(=O)NCCCOCC1CCCO1 m_22____4836780____20839824 C1=NN=C(NCCCOCC2CCCO2)S1 s_27____134549____6894080 ClCCCCNCCCOCC1CCCO1 m_270004____8288964____24721168 CCNC(=O)NCCCOCC1CCCO1 m_2554____899626____951600 C(CNC1CCNC1)COCC1CCCO1 m_271302____10888820____10888454 NC1CC(NCCCOCC2CCCO2)C1 m_271302____10888820____15791668 C(CNC1=NCCN1)COCC1CCCO1 m_58668____2550466____2679600 C(CNCC1CCC1)COCC1CCCO1 m_270004____8288964____24715052 C[C@@H](O)C(=O)NCCCOCC1CCCO1 m_22____4836780____17541882 C(CNC1CCOC1)COCC1CCCO1 m_270004____8288964____8157394 C(CNCC1CNC1)COCC1CCCO1 m_271302____10888820____25575958 C1=CCC(NCCCOCC2CCCO2)C1 m_207____134553____26295706 [2H]C([2H])([2H])N(C)C(=O)NCCCOCC1CCCO1 m_2708____906386____23466438 C(CNCC1CCN1)COCC1CCCO1 m_271302____10888820____25576026 C1=COC(NCCCOCC2CCCO2)=N1 s_27____134549____11243710 CC(C)(N)CNCCCOCC1CCCO1 m_271302____10888820____8904400 NC1(CNCCCOCC2CCCO2)CC1 m_271302____10888820____25579862 CNS(=O)(=O)NCCCOCC1CCCO1 m_40____134551____9420296 ``` </details> Not being familiar with the file, but that _looks_ OK? Also, do you see that issue with garbled written file with the simplified reproducer with generated data that I posted above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
