----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/104310/#review11510 -----------------------------------------------------------
This review has been submitted with commit 711b585e6284e1346c8661d60e6bf04c0223fd8c by Sam Lade to branch master. - Commit Hook On March 17, 2012, 9 a.m., Alexey Neyman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://git.reviewboard.kde.org/r/104310/ > ----------------------------------------------------------- > > (Updated March 17, 2012, 9 a.m.) > > > Review request for Amarok. > > > Description > ------- > > Amarok incorrectly scans files with non-ascii characters in tags. The symptom > is that > some of the files have two "invalid UTF character" symbols instead of a > single non-ascii > character (looks like <?><?>, question mark inside a black circle). Most > visible effect > of this issue is that some albums end up in Various Artists because one of > the tracks > had artist name corrupted in this way. It is not limited to artist name, > though - > there are tracks with corrupted album names or titles. > > > > > > The reason for this issue is as follows. When Amarok invokes collection > scanner > process, it receives the results from the amarokcollectionscanner over a > pipe. Here is > a snippet of code from src/core-impl/collections/db/ScanManager.cpp: > > void > ScannerJob::getScannerOutput() > { > m_incompleteTagBuffer += m_scanner->readAll(); > > } > > The m_incompleteTagBuffer is declared in > src/core-impl/collections/db/ScanManager.h: > > QString m_incompleteTagBuffer > > However, m_scanner->readAll() returns QByteArray, not QString. This is okay > for ASCII > characters (which are 1 byte in UTF8), but breaks in case of multibyte > sequences. If > readAll() method returns a block which terminates in a middle of the > multibyte sequence, > conversion to QString in ScannerJob::getScannerOutput replaces the last > character with > "invalid UTF character" symbol. When the next block is read, it starts in the > middle of > UTF8 multibyte sequence - so it gets replaced with one more "invalid UTF > character" > symbol. Thus, a single multibyte UTF8 character is replaced with two "invalid > character" > symbols. > > The solution implemented by the attached patch is to store incomplete > information as > QByteArray and search for partial ("</directory>") or full ("</scanner>") > elements in the > byte stream, before conversion to QString. Complete blocks can be safely > converted to > QString, as the multibyte characters are inside the XML tags. > > > Diffs > ----- > > src/core-impl/collections/db/ScanManager.h 5f0d153 > src/core-impl/collections/db/ScanManager.cpp 97d0b1c > > Diff: http://git.reviewboard.kde.org/r/104310/diff/ > > > Testing > ------- > > > Thanks, > > Alexey Neyman > >
_______________________________________________ Amarok-devel mailing list [email protected] https://mail.kde.org/mailman/listinfo/amarok-devel
