-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/104310/
-----------------------------------------------------------
Review request for Amarok.
Description
-------
Amarok incorrectly scans files with non-ascii characters in tags. The symptom
is that
some of the files have two "invalid UTF character" symbols instead of a single
non-ascii
character (looks like <?><?>, question mark inside a black circle). Most
visible effect
of this issue is that some albums end up in Various Artists because one of the
tracks
had artist name corrupted in this way. It is not limited to artist name,
though -
there are tracks with corrupted album names or titles.
The reason for this issue is as follows. When Amarok invokes collection scanner
process, it receives the results from the amarokcollectionscanner over a pipe.
Here is
a snippet of code from src/core-impl/collections/db/ScanManager.cpp:
void
ScannerJob::getScannerOutput()
{
m_incompleteTagBuffer += m_scanner->readAll();
}
The m_incompleteTagBuffer is declared in
src/core-impl/collections/db/ScanManager.h:
QString m_incompleteTagBuffer
However, m_scanner->readAll() returns QByteArray, not QString. This is okay for
ASCII
characters (which are 1 byte in UTF8), but breaks in case of multibyte
sequences. If
readAll() method returns a block which terminates in a middle of the multibyte
sequence,
conversion to QString in ScannerJob::getScannerOutput replaces the last
character with
"invalid UTF character" symbol. When the next block is read, it starts in the
middle of
UTF8 multibyte sequence - so it gets replaced with one more "invalid UTF
character"
symbol. Thus, a single multibyte UTF8 character is replaced with two "invalid
character"
symbols.
The solution implemented by the attached patch is to store incomplete
information as
QByteArray and search for partial ("</directory>") or full ("</scanner>")
elements in the
byte stream, before conversion to QString. Complete blocks can be safely
converted to
QString, as the multibyte characters are inside the XML tags.
Diffs
-----
src/core-impl/collections/db/ScanManager.h 5f0d153
src/core-impl/collections/db/ScanManager.cpp 97d0b1c
Diff: http://git.reviewboard.kde.org/r/104310/diff/
Testing
-------
Thanks,
Alexey Neyman
_______________________________________________
Amarok-devel mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/amarok-devel