Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > Hi,
> >
> > by chance I realised that the uploaders table contains some names where
> > names
> > are not stripped:
> >
> > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from
> > uploaders u where name like '% ' or name like ' %' ;
> > name_with_spaces | uploader
> > --------------------------+-------------------------------------------
> > " Mehdi Dogguy" | Mehdi Dogguy <[email protected]>
> > " David Paleino" | David Paleino <[email protected]>
> > " Stéphane Glondu" | Stéphane Glondu <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > "Andreas Tille " | Andreas Tille <[email protected]>
> > " LI Daobing" | LI Daobing <[email protected]>
> > " David Paleino" | David Paleino <[email protected]>
> > " Stefano Zacchiroli" | Stefano Zacchiroli <[email protected]>
> > " Nikita V. Youshchenko" | Nikita V. Youshchenko <[email protected]>
> > " Nikita V. Youshchenko" | Nikita V. Youshchenko <[email protected]>
> > " Nikita V. Youshchenko" | Nikita V. Youshchenko <[email protected]>
> > " Nikita V. Youshchenko" | Nikita V. Youshchenko <[email protected]>
> > " Nikita V. Youshchenko" | Nikita V. Youshchenko <[email protected]>
> > "Colin Tuckley " | Colin Tuckley <[email protected]>
> > "Colin Tuckley " | Colin Tuckley <[email protected]>
> > "Colin Tuckley " | Colin Tuckley <[email protected]>
> > (20 rows)
> > ...
> > UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' ||
> > email WHERE name like ' %' or name like '% ' ;
> >
>
> Uploaders is refreshed every few hours from archive data, so a one-time
> UPDATE would not help. UDD usually tries to preserve inaccuracies, so
> those might be interesting for QA work.
OK.
> In your case, why don't you use the email address to identify uploaders?
Since this also does not work:
udd=> SELECT count(*), uploader FROM uploaders WHERE name ilike '%tille%' GROUP
BY uploader;
count | uploader
-------+------------------------------------
1 | Andreas Tille <[email protected]>
1 | Andreas Tille <[email protected]>
8785 | Andreas Tille <[email protected]>
(3 Zeilen)
> (possibly combining it with the carnivore data to identify different emails
> belonging to the same person ?)
I could fiddle around with carnivore but that's overkill for thst
purpose and I insist that not stripping blanks from names does not make
any sense, IMHO. (1 Zeile)
BTW: I found
udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' THEN
maintainer_name ELSE changed_by_name END AS name FROM upload_history) uh WHERE
name ilike '%tille%' group by name;
count | name
-------+---------------
16524 | Andreas Tille
(1 Zeile)
So why do I have 8707 uploads per uploaders but 16524 per upload_history?
Is my assumption wrong that both values should match (modulo some wrongly
spelled names)
Kind regards
Andreas.
--
http://fam-tille.de