yrenat opened a new issue, #3739:
URL: https://github.com/apache/texera/issues/3739

   The architecture consists of four primary AWS components working in concert:
   
   **Application Load Balancer (ALB):** A Layer load balancer serving as the 
ingress point. It listens for HTTP requests and routes them to registered 
targets within a specified target group (this is one of the biggest differences 
from lambda)
   
   **Amazon ECS Service:** A logical controller that maintains a declared 
number of task instantiations (`Desired Count`) from a specific task 
definition. It is responsible for service discovery and integrating with the 
ALB and Application Auto Scaling.
   
   **AWS Fargate:** A serverless compute engine that provides the underlying 
compute plane for the ECS tasks. It abstracts away all server management.
   
   **Application Auto Scaling & CloudWatch:** The monitoring and control loop. 
CloudWatch tracks the `ECSServiceAverageCPUUtilization` metric, and its alarms 
trigger scaling actions defined in Application Auto Scaling.
   
   ### The Role of AWS Fargate
   
   When the ECS Service needs to launch a task (either at initial deployment or 
during a scale-out event), it dispatch the request to Fargate. When a task is 
finished, the ECS Service instructs Fargate to terminate the task. Fargate then 
reclaims the compute resources. The key point is that Fargate handles the 
entire flow, without the need to manually allocate/de-allocate resources. 
   
   ### Some Other Key Points
   
   The following sequence details the auto-scaling process from load generation 
to system stabilization:
   
   1.  **Auto Scale-up/Scale-down**: There are three methods to set the 
threshold to scale-up/scale-down, and CPU usage is the most commonly used one. 
Users specify the maximum percentage of CPU usage before the task is deployed. 
If, in the process, the usage exceeds that threshold, then more CPU (more 
resources) will be allocated. Other metrics include the RAM usage, and the 
number of commands coming in.
   2.  **Alarm State Transition:** The Target Tracking scaling policy creates a 
CloudWatch alarm. When the `ECSServiceAverageCPUUtilization` metric exceeds the 
defined target value (e.g., 70%) for the configured number of evaluation 
periods (e.g., 3 consecutive periods of 1 minute), the alarm's state 
transitions from `OK` to `IN_ALARM`.
   3.  **Scaling Policy Invocation:** The `IN_ALARM` state triggers the 
associated Application Auto Scaling policy. The policy calculates the required 
number of new tasks needed to bring the average CPU back down to the target 
value. It then makes an API call to update the `Desired Count` of the ECS 
Service.
   4.  **Fargate Task Launch:** The ECS Service scheduler detects that the 
`Running Count` is less than the new `Desired Count`. It invokes the Fargate 
`RunTask` API, providing the task definition and network configuration. Fargate 
then provisions and launches the new task.
   5.  **Load Rebalancing:** Once the new Fargate task is running and passes 
the ALB's health checks, the ALB adds its ENI's private IP to its list of 
active targets. It immediately begins routing a portion of the incoming 
requests to this new task, thereby reducing the CPU load on the original task 
and stabilizing the service's overall average CPU.
   
   
   
   ### Some More Things to Know
   
   1. **A very common misunderstanding:** If I set the threshold to be the 
usage of CPU, I may not be able to wake up the Fargate service. Because the 
service remains shutdown and thus cannot take in any coming requests. So the 
service can never be waken up (because there are literally no requests that can 
come to the mind of a sleeping man). **HOWEVER, that's wrong!** AWS Fargate can 
be configured in such a way that whenever a request comes in, it can always 
wake the system up (even though it is now shutdown and seems to be unable to 
take in any requests).
   2. There are three and only three ways to auto scale-up/scale-down Fargate 
service. The threshold can be set to be CPU usage, RAM usage, and the number of 
requests coming in the system. Note that even if there are no user requests 
coming in, the CPU and RAM may still not be "vacant" enough to meet the 
threshold. That's because the system itself will use resources for its own 
management and logistics. So a more common and safer way to set a threshold is 
to use the number of incoming requests. 
   3. You can always test the scalability of the service by directly sending a 
request using http! In fact, you can even directly access to that http address. 
Even though most certainly it will show the service is not there, there should 
be something showing up (instead of a blank page). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to