On Fri, Apr 9, 2021 at 7:20 PM Tenjin <jdiscordbusin...@gmail.com> wrote:

> Basically I am making a script to query an api for some json data, if I
> run it synchronously it works fine and works as intended but it needs to do
> this give or take 15thousand times so I thought I would use concurrency to
> get the job done. It works fine for about 4 thousand queries then it stops
> working and gives me this error "socket: too many open files" when I did
> research on the issue I implemented everything they did I am consuming the
> response body and then I am closing it once I am finished as well as
> letting my wait group know I am finished to exit the go routine.


I'm a bat in a vacuum here, so I'm going to guess and give some general
advice rather than concrete advice. I hope it's enough to poke you in the
right direction so you can solve your problem.

Unix uses "file descriptors" for files and also for network connections, so
they come from the same pool of resources. An account is given a number of
these to use, and in a virtualized environment it can be quite limited. You
can check your limits with something like `ulimit -a`. This is the
background for the error you are seeing. In general, it can be good to keep
this artificially low in dev environments so you run into the limits
quicker and can act before hitting production.

The particular problem can happen in a number of ways. Either because you
are leaking network connections, and also because you are leaking files
(and it just so happens you run out of network connections first). It can
also happen because your open connections "linger" after you've used them
for a little while (by HTTP clients, etc), or because connections are kept
open for reuse. Or because you open new connections while the old ones are
still in use, so they stack up. The garbage collector can't be relied on
for cleaning up, because it may take a while before it runs, and you want
to give resources back quickly so they can be reused.

A good way is to verify this is the case. Look up something like the
`netstat` or `lsof` commands (on Linux) to get a view of the open files or
the open network connections. You can then learn what the problem is, and
you can drill down into where the problem might be.

Any API has a limit to how much it can handle. And your client will have a
limit as well. Hence it is good style to put some kind of concurrency limit
on your connections. Either spawn a number of workers who read work from
the same channel. Or have a channel with a limit. You only open new work
when there's space to put a token on the channel. And when you are done,
you consume a single token out of the channel. This channel acts like a
limiter. Bryan C. Mills had an excellent talk at GopherCon 2018 about this:
https://youtu.be/5zXAHh5tJqQ?t=1641 where I've put a link to the point
where he discusses it, but note all of the talk is highly recommended. The
reason Bryan's suggestion is good is because it limits the amount of
goroutines you have, and debugging is way easier this way.

Some APIs have overload protection. They'll begin returning 429 Too many
requests, or something such. Make sure you have a system which detects this
and stops hammering the API if this happens.

You can use the channel to test your setup as well. Set the limit to 1 and
you should have the synchronous behavior, and resources should be steady
when analyzed. Now you can then try to bump it to 2,4,8, ... or whatever
you feel is fair to the API.






-- 
J.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGrdgiV0xPsyawKu3pKoeSZkZR66NzwGn6Zt0skEfGOORsk%3D_Q%40mail.gmail.com.

Reply via email to