-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/104310/#review11510
-----------------------------------------------------------


This review has been submitted with commit 
711b585e6284e1346c8661d60e6bf04c0223fd8c by Sam Lade to branch master.

- Commit Hook


On March 17, 2012, 9 a.m., Alexey Neyman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/104310/
> -----------------------------------------------------------
> 
> (Updated March 17, 2012, 9 a.m.)
> 
> 
> Review request for Amarok.
> 
> 
> Description
> -------
> 
> Amarok incorrectly scans files with non-ascii characters in tags. The symptom 
> is that 
> some of the files have two "invalid UTF character" symbols instead of a 
> single non-ascii
> character (looks like <?><?>, question mark inside a black circle). Most 
> visible effect 
> of this issue is that some albums end up in Various Artists because one of 
> the tracks 
> had artist name corrupted in this way.  It is not limited to artist name, 
> though - 
> there are tracks with corrupted album names or titles.                        
>                                                                               
>                                              
>                                                                               
>                                                                               
>                                                    
> The reason for this issue is as follows. When Amarok invokes collection 
> scanner                                                        
> process, it receives the results from the amarokcollectionscanner over a 
> pipe. Here is 
> a snippet of code from src/core-impl/collections/db/ScanManager.cpp:
> 
> void    
> ScannerJob::getScannerOutput()
> {
>      m_incompleteTagBuffer += m_scanner->readAll();                           
>                                                                            
> }
> 
> The m_incompleteTagBuffer is declared in 
> src/core-impl/collections/db/ScanManager.h:
> 
>      QString m_incompleteTagBuffer
> 
> However, m_scanner->readAll() returns QByteArray, not QString. This is okay 
> for ASCII
> characters (which are 1 byte in UTF8), but breaks in case of multibyte 
> sequences. If
> readAll() method returns a block which terminates in a middle of the 
> multibyte sequence,
> conversion to QString in ScannerJob::getScannerOutput replaces the last 
> character with
> "invalid UTF character" symbol. When the next block is read, it starts in the 
> middle of
> UTF8 multibyte sequence - so it gets replaced with one more "invalid UTF 
> character"
> symbol. Thus, a single multibyte UTF8 character is replaced with two "invalid 
> character"
> symbols.
> 
> The solution implemented by the attached patch is to store incomplete 
> information as
> QByteArray and search for partial ("</directory>") or full ("</scanner>") 
> elements in the
> byte stream, before conversion to QString. Complete blocks can be safely 
> converted to
> QString, as the multibyte characters are inside the XML tags.
> 
> 
> Diffs
> -----
> 
>   src/core-impl/collections/db/ScanManager.h 5f0d153 
>   src/core-impl/collections/db/ScanManager.cpp 97d0b1c 
> 
> Diff: http://git.reviewboard.kde.org/r/104310/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexey Neyman
> 
>

_______________________________________________
Amarok-devel mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/amarok-devel

Reply via email to