Hello, how are you? Thanks for your time
> Does the data contain records? Yes > Are the records "homogenous" ; ie; do they have the same fields? Yes the data is homogenous but have “two layouts” in the same file. > What is the format of the data? All data is string file .txt > Are records separated by lines/seperators? Yes, the delimiter is “#” but as said, we have two layouts in the same file This likely key value > Carteira em#27/12/2019##Todos os Beneficiários > Operadora#AMIL > Filial#SÃO PAULO#Unidade#Guarulhos > > Contrato#123456 - Test > Empresa#Test And this like csv format > Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva > 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva > Is the data sharded across multiple files? No > How big is each shard? Approximately 20gb > On 8 Feb 2022, at 16:56, Lalwani, Jayesh <jlalw...@amazon.com> wrote: > > You will need to provide more info. > > Does the data contain records? > Are the records "homogenous" ; ie; do they have the same fields? > What is the format of the data? > Are records separated by lines/seperators? > Is the data sharded across multiple files? > How big is each shard? > > > > On 2/8/22, 11:50 AM, "Danilo Sousa" <danilosousa...@gmail.com> wrote: > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hi > I have to transform unstructured text to dataframe. > Could anyone please help with Scala code ? > > Dataframe need as: > > operadora filial unidade contrato empresa plano codigo_beneficiario > nome_beneficiario > > Relação de Beneficiários Ativos e Excluídos > Carteira em#27/12/2019##Todos os Beneficiários > Operadora#AMIL > Filial#SÃO PAULO#Unidade#Guarulhos > > Contrato#123456 - Test > Empresa#Test > Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva > 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva > > Contrato#898011000 - FUNDACAO GERDAU > Empresa#FUNDACAO GERDAU > Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org