Hello, how are you?

Thanks for your time

> Does the data contain records? 
Yes
> Are the records "homogenous" ; ie; do they have the same fields?
Yes the data is homogenous but have “two layouts” in the same file.
> What is the format of the data?
All data is string file .txt
> Are records separated by lines/seperators?
Yes, the delimiter is “#” but as said, we have two layouts in the same file
This likely key value
>    Carteira em#27/12/2019##Todos os Beneficiários
>    Operadora#AMIL
>    Filial#SÃO PAULO#Unidade#Guarulhos
> 
>    Contrato#123456 - Test
>    Empresa#Test

And this like csv format

>    Plano#Código Beneficiário#Nome Beneficiário
>    58693 - NACIONAL R COPART PJCE#073930312#Joao Silva
>    58693 - NACIONAL R COPART PJCE#073930313#Maria Silva

> Is the data sharded across multiple files?
No
> How big is each shard?
Approximately 20gb

> On 8 Feb 2022, at 16:56, Lalwani, Jayesh <jlalw...@amazon.com> wrote:
> 
> You will need to provide more info.
> 
> Does the data contain records? 
> Are the records "homogenous" ; ie; do they have the same fields?
> What is the format of the data?
> Are records separated by lines/seperators?
> Is the data sharded across multiple files?
> How big is each shard?
> 
> 
> 
> On 2/8/22, 11:50 AM, "Danilo Sousa" <danilosousa...@gmail.com> wrote:
> 
>    CAUTION: This email originated from outside of the organization. Do not 
> click links or open attachments unless you can confirm the sender and know 
> the content is safe.
> 
> 
> 
>    Hi
>    I have to transform unstructured text to dataframe.
>    Could anyone please help with Scala code ?
> 
>    Dataframe need as:
> 
>    operadora filial unidade contrato empresa plano codigo_beneficiario 
> nome_beneficiario
> 
>    Relação de Beneficiários Ativos e Excluídos
>    Carteira em#27/12/2019##Todos os Beneficiários
>    Operadora#AMIL
>    Filial#SÃO PAULO#Unidade#Guarulhos
> 
>    Contrato#123456 - Test
>    Empresa#Test
>    Plano#Código Beneficiário#Nome Beneficiário
>    58693 - NACIONAL R COPART PJCE#073930312#Joao Silva
>    58693 - NACIONAL R COPART PJCE#073930313#Maria Silva
> 
>    Contrato#898011000 - FUNDACAO GERDAU
>    Empresa#FUNDACAO GERDAU
>    Plano#Código Beneficiário#Nome Beneficiário
>    58693 - NACIONAL R COPART PJCE#065751353#Jose Silva
>    58693 - NACIONAL R COPART PJCE#065751388#Joana Silva
>    58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva
>    58693 - NACIONAL R COPART PJCE#065751388#Julia Silva
>    ---------------------------------------------------------------------
>    To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to