[Spark SQL] Data objects from query history

2023-06-30 Thread Ruben Mennes
Dear Apache Spark community, I hope this email finds you well. My name is Ruben, and I am an enthusiastic user of Apache Spark, specifically through the Databricks platform. I am reaching out to you today to seek your assistance and guidance regarding a specific use case. I have been

Re: use case reading files split per id

2016-11-16 Thread ruben
Yes that binary files function looks interesting, thanks for the tip. Some followup questions: - I always wonder when people are talking about 'small' files and 'large' files. Is there any rule of thumb when these things apply? Are small files those which can fit completely in memory on the node

use case reading files split per id

2016-11-08 Thread ruben
Hey, We have files organized on hdfs in this manner: base_folder |- |- file1 |- file2 |- ... |- |- file1 |- file2 |- ... | - ... We want to be able to do the following operation on our data: - for each ID we want to