Re: [julia-users] Question: Forcing readtable to create string type on import

LeAnthony Mathews Wed, 02 Nov 2016 13:15:31 -0700

Spoke too soon.  
Again I simple want the CSV column that is read in to not be an int32, but 
a string.


Still having issues casting the CSV file back into a Dataframe.
Its hard to understand why the Julia system is attempting to determine the 
type of the columns when I use readtable and I have no control over this.

Why can I not say:
df1 = readtable(file1; types=Dict(1=>String)) # assuming your account 
number is column # 1

*Reading the Julia spec-Advanced Options for Reading CSV Files*
*readtable accepts the following optional keyword arguments:*

*eltypes::Vector{DataType} – Specify the types of all columns. Defaults to 
[].*


*df1 = readtable(file1, Int32::Vector(String))*

I get 
*ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}*

Is this even an option?  Or how about convert the df1_CSV to df1_dataframe? 
 
*df1_dataframe = convert(dataframe, df1_CSV)*
Since the CSV .read seems to give more granular control.


On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
>
> Great, that worked for forcing the column into a string type.
> Thanks
>
> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
>>
>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/
>>
>> In this case, you'd do:
>>
>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
>> number is column # 1
>> df2 = CSV.read(file2; types=Dict(1=>String))
>>
>> -Jacob
>>
>>
>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <leant...@gmail.com> 
>> wrote:
>>
>>> Using v0.5.0
>>> I have two different 10,000 line CSV files that I am reading into two 
>>> different dataframe variables using the readtable function.
>>> Each table has in common a ten digit account_number that I would like to 
>>> use as an index and join into one master file.
>>>
>>> Here is the account number example in the original CSV from file1:
>>> 8018884596
>>> 8018893530
>>> 8018909633
>>>
>>> When I do a readtable of this CSV into file1 then do a* 
>>> typeof(file1[:account_number])* I get:
>>> *DataArrays.DataArray(Int32,1)*
>>>  -571049996
>>>  -571041062
>>>  -571024959
>>>
>>> when I do a 
>>> *typeof(file2[:account_number])*
>>> *DataArrays.DataArray(String,1)*
>>>
>>>
>>> *Question:  *
>>> My CSV files give no guidance that account_number should be Int32 or 
>>> string type.  How do I force it to make both account_number elements type 
>>> String?
>>>
>>> I would like this join command to work:
>>> *new_account_join = join(file1, file2, on =:account_number,kind = :left)*
>>>
>>> But I am getting this error:
>>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
>>> got Array{*
>>> *Array{Symbol,1},1}*
>>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
>>> ::DataFrames.DataFrame, ::D*
>>> *ataFrames.DataFrame) at .\<missing>:0*
>>>
>>>
>>> Any help would be appreciated.  
>>>
>>>
>>>
>>

Re: [julia-users] Question: Forcing readtable to create string type on import

Reply via email to