Hi,

Thanks for the response.

No, I don’t have the ‘true’ value and that’s where I found it challenging.

I like the second part of your suggestion, thanks for that.

Mohit

On Fri, 26 Feb 2021 at 08:15, m.gufranpathan <[email protected]>
wrote:

> Hi,
>
> I'm assuming you know the "True" value of the company's classification
> (i.e. You know whether the company is "valid" or "invalid")
>
> If that's the case, then you could calculate accuracy as no of correct
> classifications divided by total number of classification. There are two
> other metrics commonly used - Precision and Recall. You could use them if
> the cost of incorrectly predicting a "Valid" company is higher (or lower)
> than incorrect predicting an "Invalid" company.
>
> Happy to help if you have more questions / clarifications.
>
>
>
> Sent from my Samsung Galaxy smartphone.
>
> -------- Original message --------
> From: Mohit K <[email protected]>
> Date: 2/21/21 4:01 PM (GMT+04:00)
> To: [email protected]
> Subject: [datameet] Comparing 2 Classification Algos
>
> Hi All,
>
> I am trying to evaluate 2 algos that classify companies based on certain
> criteria. I need to compare which one is doing a better job at
> classification. Data file can be found at below G drive link. Results of
> both algos are in column J & K.
>
>
> https://drive.google.com/file/d/1ZFtknedWZANrQQVgVqYxGMtCkhOvJ8hK/view?usp=sharing
>
> If anyone from data analytics background could help me, how to approach
> this?
>
> Thanks,
> Mohit
>
> Some of the details are given below:
>
> The file contains data of companies.  Each row is a company in your
> company database.  As you are aware there are a lot of duplicated companies
> so your database are marking them as “Invalid” in Column J “Flag” in the
> old algorithm.  The new algorithm has these “Flags” listed in Column K.  So
> when it says Valid, that’s a company that is determined by the algorithm to
> be a good company + real company + not duplicated, to be kept in the
> database.
>
> There are additional data in the file for each of the companies to help
> you evaluate the companies.
>
> Some issues:
>
>
> 1) Some companies have many legitimate subsidiaries.  Like Google and
> YouTube might be 2 companies but YouTube is a subsidiary of Google.  What
> you have decided to do is that you want these to stay in your database as 2
> separate companies, if these 3 conditions are met:
> a) the subsidiary is large and >$100M revenue,
> b) the name of that company looks substantially different from the parent,
> and
> c) that the identity of the subsidiary still exists because sometime the
> parent company just absorbs the subsidiary into the parent company and the
> subsidiary disappears ie their website no longer exists.
> In the Google / YouTube example, all three conditions are met, so both
> Google and YouTube are kept as different companies in your database.
>
> 2) There are many big companies that often have hundreds of subsidiaries
> that are all pretty much the same company.  For example, Citibank can have
> many subsidiaries like Citibank Auto Loans, Citibank New York, Citibank
> Florida, and those typically look like the same company to most consumers,
> so you do not want to keep all those subsidiaries but just to keep the main
> parent company.
>
> 3) When we have multiple of the same company in our database that are
> exactly the same company, like the company name is the same and the url is
> the same.  In those cases, you want to keep the company listing in the
> database that has the most information (e.g. revenue, employee #, etc.),
> the highest revenue, etc, and remove the ones with less.
>
> 4) There are often wrong/incorrect information and so of course you want
> to keep the database listing with the most accurate information.
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAJk6f4AftY473Q2ohbGzdPekhm7iybe349F69Xcqy_ZBrO3gTw%40mail.gmail.com
> <https://groups.google.com/d/msgid/datameet/CAJk6f4AftY473Q2ohbGzdPekhm7iybe349F69Xcqy_ZBrO3gTw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/6034e8a9.1c69fb81.f66f4.932e%40mx.google.com
> <https://groups.google.com/d/msgid/datameet/6034e8a9.1c69fb81.f66f4.932e%40mx.google.com?utm_medium=email&utm_source=footer>
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAJk6f4Bh4Dd3c%2BUk2EF%2BmL3Qg5xrDqq6Yq4f6oHMsnHCksSCJw%40mail.gmail.com.

Reply via email to