Miyake-Diogo opened a new issue, #3783:
URL: https://github.com/apache/arrow-datafusion/issues/3783

   **Describe the bug**
   When I try to save dataframe as csv, only around 400K of lines are saved.. 
data has more than 1M of lines.
   
   **To Reproduce**
   My code: 
   ``` rust
   use datafusion::prelude::*;
   use log::{debug, info, LevelFilter, trace};
   use crate::datapipeline::data_utils::*;
   pub mod datapipeline;
   use datafusion::logical_plan::when;
   
   use datafusion::arrow::datatypes::DataType::{Int64,Utf8};
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
     let ctx: SessionContext = SessionContext::new();
     let raw_fato_path: &str = "data/minilake/raw/fato_census/Data8277.csv";
     let stage_fato_path: &str = "data/minilake/stage/fato_census/";
     let fato_census_df = ctx.read_csv(raw_fato_path,  
                                     CsvReadOptions::new()).await?;
     
     let fato_census_df = fato_census_df.with_column("area",cast(
       col("Area"),
       Utf8))?;
   
     let fato_census_df = fato_census_df
       //.with_column("Area",concat_ws("-", &vec![lit("A"),col("Area")]))?
       .select(vec![
         col("Year").alias("year"),
         col("Age").alias("age"),
         col("Ethnic").alias("ethnic"),
         col("Sex").alias("sex"),
         col("Area").alias("area"),
         col("count").alias("total_count")
         ])?;
     
     // We can see the ..C values in Count column
     fato_census_df.show_limit(5).await?;
     print_schema_of_dataframe(&fato_census_df).await?;
     // Create a function to make trnasformation
     let transform_count_data = when(col("total_count")
       .eq(lit("..C")), lit(0_u32))
       .otherwise(col("total_count"))?;
   
     //Cast column datatype
     let fato_census_df = fato_census_df.with_column(
       "total_count",
       cast(transform_count_data, Int64))?;
     
     fato_census_df.write_csv(stage_fato_path).await?;
   
     Ok(())
     }
   ```
   Dataset: 
   
   [Age and sex by ethnic group (grouped total responses), for census usually 
resident population counts, 2006, 2013, and 2018 Censuses (RC, TA, SA2, 
DHB)](https://www3.stats.govt.nz/2018census/Age-sex-by-ethnic-group-grouped-total-responses-census-usually-resident-population-counts-2006-2013-2018-Censuses-RC-TA-SA2-DHB.zip?_ga=2.148542962.457556406.1664998127-985979153.1663098055)
   **Expected behavior**
   See all lines saved: 
   
   <img width="845" alt="image" 
src="https://user-images.githubusercontent.com/24550387/194943530-8082c81a-18d1-45be-89fc-df4e54bec121.png";>
   
   
   But only this quantity are saved.
   <img width="790" alt="image" 
src="https://user-images.githubusercontent.com/24550387/194943662-df2a4f9a-b7cf-419f-a08f-d66b3c80eb08.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to