Hi,

I highly can recommend to use pandas to read csv. It does pretty good job
to guess a lot of things without extra config.

Of course it's one more extra dependency.


pe 24. heinäk. 2020 klo 17.09 Ronaldo Mata <[email protected]>
kirjoitti:

> Yes, I will try it. Anythin I will let you know
>
> El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng <
> [email protected]> escribió:
>
>> Hi,
>>
>> Are you sure that the file used for detection is the same as the file
>> opened and decoded and gave you incorrect information?
>>
>> By the way, ascii is a proper subset of utf-8. If chardet said it ascii,
>> decoding it using utf-8 should always work.
>>
>> If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet?
>> You can try it directly, without mixing it with django’s requests first.
>> Make sure you can detect and decode the file locally in a test program.
>> Then put it into the app.
>>
>> If you share the file, i’m also glad to help you try it.
>>
>> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata <[email protected]>
>> wrote:
>>
>>> Hi Kovy, this is not solved. Liu Zheng but using
>>> chardet(request.FILES['file'].read()) return encoding "ascii" is not
>>> correct, I've uploaded a file using utf-7 as encoding for example and the
>>> result is wrog. and then I tried
>>> request.FILES['file'].read().decode('ascii') and not work return bad data.
>>> Example for @ string return "+AEA-" string.
>>>
>>> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob (<[email protected]>)
>>> escribió:
>>>
>>>> I’m confused. I don’t know if I can help.
>>>>
>>>> On Jul 22, 2020, at 11:11 AM, Liu Zheng <[email protected]> wrote:
>>>>
>>>> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’]
>>>> and the chardet file handler are binary handlers. Binary handler presents
>>>> the raw data. chardet takes a sequence or raw data and then detect the
>>>> encoding format. With its prediction, if you want to open that puece of
>>>> data in text mode, you can use the .decode(<encoding format>) method of
>>>> bytes object to get a python string.
>>>>
>>>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob <[email protected]>
>>>> wrote:
>>>>
>>>>> That’s probably not the proper answer, but that’s the best I can do.
>>>>> Sorry :-(
>>>>>
>>>>>
>>>>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Yes, the problem here is that the files will be loaded by the user, so
>>>>> I don't know what delimiter I will receive. This is not a base command 
>>>>> that
>>>>> I am using, it is the logic that I want to incorporate in a view
>>>>>
>>>>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob (<[email protected]>)
>>>>> escribió:
>>>>>
>>>>>> Ah, so is the problem that you don’t always know what the delimiter
>>>>>> is when you read it? If yes, what is the use case for this? You might not
>>>>>> need a universal solution, maybe just put all the info into a csv 
>>>>>> yourself,
>>>>>> manually.
>>>>>>
>>>>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of
>>>>>> the files, sometimes you come separated by "," others by ";" and rarely 
>>>>>> by
>>>>>> "|"
>>>>>>
>>>>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob (<[email protected]>)
>>>>>> escribió:
>>>>>>
>>>>>>> Could you just use the standard python csv module?
>>>>>>>
>>>>>>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Liu thank for your answer.
>>>>>>>
>>>>>>> This has been a headache, I am trying to read the file using
>>>>>>> csv.DictReader initially i had an error trying to get the dict keys when
>>>>>>> iterating by rows, and i thought it could be encoding (for this reason i
>>>>>>> wanted to prepare the view to use the correct encoding). for that 
>>>>>>> reason I
>>>>>>> asked my question.
>>>>>>>
>>>>>>> 1) your first approach doesn't work, if i send utf-8 file, chardet
>>>>>>> returns ascii as encoding. it seems request.FILES ['file']. read () 
>>>>>>> returns
>>>>>>> a binary with that encoding.
>>>>>>>
>>>>>>> 2) In the end I realized that the problem was the delimiter of the
>>>>>>> csv but predicting it is another problem.
>>>>>>>
>>>>>>> Anyway, it was a task that I had to do and that was my limitation. I
>>>>>>> think there must be a library that does all this, uploading a csv file 
>>>>>>> is
>>>>>>> common practice in many web apps.
>>>>>>>
>>>>>>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (<
>>>>>>> [email protected]>) escribió:
>>>>>>>
>>>>>>>> Hi. First of all, I think it's impossible to perfectly detect
>>>>>>>> encoding without further information. See the answer in this SO post:
>>>>>>>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>>>>>>>  There
>>>>>>>> are many packages and tools to help detect encoding format, but keep in
>>>>>>>> mind that they are only giving educated guesses. (Most of the time, the
>>>>>>>> guess is correct, but do check the dev page to see whether there are 
>>>>>>>> known
>>>>>>>> issues related to your problem.)
>>>>>>>>
>>>>>>>> Now let's say you have decided to use chardet. Check its doc page
>>>>>>>> for the usage:
>>>>>>>> https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
>>>>>>>> have more than one solutions. Here are some examples:
>>>>>>>>
>>>>>>>> 1. If the files uploaded to your server are all expected to be
>>>>>>>> small csv files (less than a few MB and not many users do it 
>>>>>>>> concurrently),
>>>>>>>> you can do the following:
>>>>>>>>
>>>>>>>> #in the view to handle the uploaded file: (assume file input name
>>>>>>>> is just "file")
>>>>>>>> file_content = request.FILES['file'].read()
>>>>>>>> chardet.detect(file_content)
>>>>>>>>
>>>>>>>> 2. Also, chardet seems to support incremental (line-by-line)
>>>>>>>> detection
>>>>>>>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>>>>>>>
>>>>>>>> Given this, we can also read from requests.FILES line by line and
>>>>>>>> pass each line to chardet
>>>>>>>>
>>>>>>>> from chardet.universaldetector import UniversalDetector
>>>>>>>>
>>>>>>>> #somewhere in a view function
>>>>>>>> detector = UniversalDetector()
>>>>>>>> file_handle = request.FILES['file']
>>>>>>>> for line in file_handle:
>>>>>>>>     detector.feed(line)
>>>>>>>>     if detector.done: break
>>>>>>>> detector.close()
>>>>>>>> # result available as a dict at detector.result
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>>>>>>>>
>>>>>>>>> How to deal with encoding when you try to read a csv file on view.
>>>>>>>>>
>>>>>>>>> I have a view to upload csv file, in this view I read file and
>>>>>>>>> save each row as new record.
>>>>>>>>>
>>>>>>>>> My bug is when I try to upload a csv file with a
>>>>>>>>> differente encoding (not UTF-8)
>>>>>>>>>
>>>>>>>>> how to handle this on django (using request.FILES) I was
>>>>>>>>> researching and I found chardet but I don't know how to pass it a
>>>>>>>>> request.FILES. I need help please.
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Django users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Django users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Django users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com
>>>>>>> <https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Django users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Django users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com
>>>>>> <https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Django users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Django users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com
>>>>> <https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com
>>>> <https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAHn91offCbz%3DH_QH%3D60wpVVM6xHFPnSj4oFg4ZMOso5PS5SfzA%40mail.gmail.com.

Reply via email to